Copy a Long Line from a Text File and Save It to a New File

This is a follow-up to my recent post on how to store the nth line of a text file in a variable. The solution given won’t work for extremely long lines because the line requested by the user is stored in a variable and variables in Batch can only hold up to 8191 characters.

If you need to select an extremely long line from a file and save it to a new file, there is a workaround. But it ain’t pretty, or efficient. It involves more, findstr, and a whole lot of temporary files. 😈

Example

In this example, we’re looking for the 100th line from longlines.txt. If the 100th line happens to be the last line of the file, all we have to do is:

more +99 longlines.txt > line100.txt

And we’re done! Note that more starts counting lines from zero. However, the last line is a special case. The rest of the time, we have to jump through the following hoops…

First, we copy all but the first 99 lines of longlines.txt to a new file called longlines-1.txt using this command:

more +99 longlines.txt > longlines-1.txt

Note that if the last line in the file doesn’t end with a newline (CR+LF), more will append one (whether we want it to or not).

Next, we add numbers to the start of each line with the help of findstr:

findstr /n $ longlines-1.txt > longlines-2.txt

The $ is a special character used by findstr to match the end of a line. A loose translation of the command above would be: “Add a consecutive number (starting from 1) to the start of every line in longlines-1.txt that ends with a newline and save the output to longlines-2.txt.” Note how we don’t have to worry about a missing newline at the end of the file since more has already taken care of that for us!

At this point, we’re finished with longlines-1.txt. It could potentially be a huge file so let’s delete it:

del longlines-1.txt

Now we need to extract the first line of longlines-2.txt and save it to a file on its own. Because we’ve already numbered the lines in the file, we can be certain that the line we require uniquely begins with 1:. This is yet another job for findstr:

findstr /b "1:" longlines-2.txt > longline1.txt

The /b switch tells findstr to only match lines that contain 1: at the beginning of the line. The output from findstr is redirected into the longline1.txt file. We’re finished with longlines-2.txt at this point and it can be safely deleted.

Next, we need to remove the 1: from the start of the long line. To do this, we more the file and pipe the output through a chain of two pause commands. A side-effect of pause is that it consumes the first character piped into it. We redirect pause‘s output to nul to suppress the “Press any key to continue” message and then we simply redirect the resulting output to a file using a second more command. The whole thing goes a little something like this:

more longline1.txt | (pause >nul & pause >nul & more > longline2.txt)

We can now delete longline1.txt. All those more commands and use of pipes will have appended one or more additional newlines to the end of the file. To get rid of them, we have to—you guessed it—rely on our old friend findstr one last time. The following filters out all blank lines:

findstr /v "^$" longline2.txt > line100.txt

And at long last we have picked out the 100th line and saved it to a new file (assuming it wasn’t a blank line in which case the file will be empty).

Program

The example above can be easily followed by entering one command at a time from the Command Prompt, but for a more general solution, you’ll need to write a program which can be a can of worms because you’ll run into all sorts of problems to do with verifying filenames and checking the line number to be sure it’s within the permitted range, etc, etc…

Well, it’s a good thing I got here first and did it for you! 😉 Consider it an early Christmas present.

@echo off & setlocal enableextensions
set "a2z=abcdefghijklmnopqrstuvwxyz" & set "nos=0123456789"

if "%~1"=="" (call :usage && goto end) else set "nth=%~1"
if "%~2"=="" (call :error "specify a file name" || goto end
) else set "infile=%~2"
if "%~3" neq "" (echo("%~3"| findstr /ix^
 ^"\"[%a2z%][%a2z%%nos%\._-]*[%a2z%%nos%]\"^" >nul ^
&& (if not exist "%~3" (set "outfile=%~3") else (
call :error ""%~3" already exists" || goto end)) ^
|| (call :error ""%~3" is invalid filename" || goto end))

for /f "tokens=1* delims=0123456789" %%a in ("A0%nth%") ^
do if "%%b" neq "" ^
call :error ""%nth%" is invalid line number" || goto end
for /f "tokens=* delims=0" %%z in ("%nth%") do (
set "nth=%%~z" & if not defined nth set /a nth=1)

if exist "%infile%\" (call :error ""%file%" is a folder" || ^
goto end) else if not exist "%infile%" (
call :error "file "%infile%" not found" || goto end)
echo("%infile%" | findstr "\* \?" >nul ^
&& (call :error "wildcards not permitted in filename" || ^
goto end)

for /f "tokens=1" %%c in ('type "%infile%" ^| find /c /v ""') ^
do set /a lines=%%c
if %lines%==0 call :error ""%infile%" is an empty file" || ^
goto end
if %nth% gtr %lines% ^
call :error "specify a line number between 1 and %lines%" || ^
goto end

if "%~3"=="" set "outfile=%~n2_line%nth%.txt"
if exist "%outfile%" ^
call :error ""%outfile%" already exists" || goto end
(type nul >"%outfile%") 2>nul || ^
call :error "unable to create files in current folder" || ^
goto end

set /a nth-=1,lines-=1
if %lines%==%nth% (
more +%nth% "%infile%" >"%outfile%" & goto result)
call :rfn "%infile-1" infile1
(type nul >"%infile1%") 2>nul || ^
call :error "unable to create temporary files" || ^
goto end
more +%nth% "%infile%" > "%infile1%"
call :rfn "%infile%-2" infile2
findstr /n $ "%infile1%" > "%infile2%"
del /f /q "%infile1%"
call :rfn "%infile%-line1" infile-line1
findstr /b "1:" "%infile2%" > "%infile-line1%"
del /f /q "%infile2%"
call :rfn "%infile%-line2" infile-line2
more "%infile-line1%" | (pause >nul & pause >nul & more >"%infile-line2%")
del /f /q "%infile-line1%"
findstr /v "^$" "%infile-line2%" > "%outfile%"
del /f /q "%infile-line2%"

:result
set /a nth+=1
echo(line %nth% of file "%infile%" has been saved to "%outfile%"

:end
endlocal & goto :EOF

:rfn random filename generator
setlocal
:retry
set /a randno=%random%
set "randfn=%tmp%\%~n1-%randno%.tmp"
if exist "%randfn%" goto retry
endlocal & set "%~2=%randfn%" & exit /b 0

:usage
cls
1>&2 (echo(Copies a long line from a text file to a new file.
echo(&echo(Usage:&echo(&echo(  %~n0 N InFile [OutFile]&echo(
echo(where the contents of line number N from file Infile ^
will be saved to file&echo(OutFile.  If OutFile is not ^
specified, a filename based on the InFile's name&echo(will ^
be generated.)
exit /b 0

:error
for /f delims^=^ eol^= %%e in ("%*") do 1>&2 echo(%%~e
exit /b 1

One small thing to be mindful of is that more will pause and wait for a keypress after 65,535 lines regardless of whether it’s piping data, redirecting a file, or sending output to the console.

Related Links

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s