A More Robust Line Counter

Posted On Thu, 23 May 2013

Filed under Batch
Tags: , , , , ,

Comments Dropped 2 responses

In a previous post, I talked about how to count the number of lines in a text file. I explained the technique of piping the output from type file.txt into find /c /v "" and wrapping the whole thing inside a for /f loop to store the result in a variable. A simple and effective solution to a common Batch programming task. 🙂

Too bad it doesn’t work…

Problem

The problem is that sometimes you can’t make assumptions about your input. There could be anything in that file you’re trying to process: extremely long lines; Unix line endings; or control characters—including my old friend the Null Character.

For reasons unexplained, find outputs null characters as newlines (as does more) which means the old type file.txt | find /c /v "" trick will give an incorrect count if the input file contains any of the little pests. 😮

Program

A more robust solution is needed in these situations. One that won’t be tripped up by anything in the input file.

The program listed below makes extensive use of findstr to determine the number of lines in a text file by searching for Line Feeds (ASCII 10, LF). LF is used to mark the end of a line in both Windows and Unix.

@echo off & setlocal enableextensions
(set lf=^

)

call :LFcount lc "%~1"
echo(file "%~1" has %lc% lines

endlocal & goto :EOF

:LFcount
setlocal enabledelayedexpansion

findstr /mv "!lf!" "%~2" >nul && (
for /f delims^=: %%n in ('
cmd /v:on /c findstr /nv "^!lf^!" "%~2" 2^>nul
') do set lastline=%%n) || (for /f delims^=: %%n in ('
(findstr /n "^" "%~2" ^& echo(#^) ^| findstr /bn # 2^>nul
') do set /a lastline=%%n-1)

endlocal & set "%1=%lastline%" & exit /b 0

Discussion

The main program in this example does nothing except define the lf variable, call the subroutine, and display the result. I’ve covered everything you would normally expect to find in the main program (such as file validity checks, command line parameter processing, error messages, etc) in previous posts. No need to go over it all again… plus the fact that I’m extremely lazy. 😉

The :LFcount subroutine accepts two parameters: a variable name, and the name of a file. The number of lines in the latter will be stored in a variable named after the former.

First, does the file end with a line feed? If not, output the line number and contents of all lines in the file that don’t end with a line feed. But there can only be one such line which has to be the last line of the file. The line number is captured by the surrounding for /f loop and stored in a variable.

If the file does end with a line feed, the procedure is a little more complicated. The output from findstr /n "^" "%~2" along with echo(# is piped into findstr /bn # which outputs the line number and contents of all lines found beginning with #. Again, if you think about it, there can only be one match: the line after the last line in the file. The line number is captured by the surrounding for /f loop, 1 is subtracted from it, and the result is stored in a variable.

The result is passed back to the main program via a variable named after the first parameter.

Notes

The above code returned the correct number of lines for files with Windows or Unix line endings (but not MacOS 9 or earlier); files with extremely long lines; large files with over 120k lines; and lines containing control characters. The result is returned without any noticeable delay, even for large files. I haven’t tested the program with Unicode text files.

Related Links

Request for Comments

What do you think? Do you know a better way to do it? Is there an error in my code? Would you like to tell me about a link related to this post? Did you find a broken link or spelling mistake?

Whatever it is, I want to hear about it! I’m always interested in your feedback. Please leave a comment below or contact me by email or on Twitter.

Advertisements

2 Responses to “A More Robust Line Counter”

  1. Sponge Belly

    Hi Prashant!

    Thanks for commenting. You are using CR as line terminator? Are you workingwith old MacOS files? Well, I might have a solution for you, but it isn’t bulletproof. The input file (infile.txt) cannot contain NUL, LF, or SUB (Ctrl-Z). Try this:

    cmd /d /u /c type infile.txt | find /v “” | findstr /x $ | find /c /v “”

    The result should be a number. Wrap the above inside the in (…) clause of a for /f loop to store the result in the variable lineCount. If the last character in infile.txt is not a CR, you’ll need to add 1 to lineCount like so:

    cmd /von /c findstr /mv “!cr!\>” infile.txt >nul && set /a lineCount+=1

    Hope this helps! If I’ve left anything out, let me know.

    – SB

  2. Prashant

    How do I modify this program to make it work with characters other than LF? I have a file that is generated on Unix, it doesn’t have LF at the end of each line; only CR. This program is unable to identify CR as end-of-line and hence always outputs 1 as the result. I thought setting the variable ‘lf’ at the start to ‘$’ will suffice but it didn’t. Any ideas?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s