Convert Newlines from Windows to Unix

Posted On Wed, 24 Oct 2012

Filed under Batch
Tags: , , , ,

Comments Dropped leave a response

16-Apr-2013: Updated source code. Rewrite of post.

Everybody knows the old trick of converting a text file with Unix newlines (LF) to Windows line-endings (CR+LF):

more unix.txt > win.txt

(Note that more will wait for a keypress after scrolling 65,534 lines, even if output is redirected to a pipe or file.) But converting from Windows to Unix is a far more complicated affair. After searching failed to find any straightforward Batch solutions—apart from this meandering thread on DosTips—I cranked out win2unix.cmd as outlined below.

Program

win2unix.cmd accepts input from a pipe or file (but not both). Output can be redirected to a file or piped to another command for further processing (such programs are known as “filters” in computer parlance). Folders, empty or non-existent files, or wildcards in the filename will throw an error. Enter win2unix /? for basic usage info. Read the notes for limitations on use.

@echo off & setlocal enableextensions
if "%~1" neq "/re-enter" goto init
shift /1
if "%~1" neq "" (call :win2unix "%~1") else call :win2unix
goto end

:init
(set lf=^

)
set nl=^^^%lf%%lf%^%lf%%lf%
for /f %%h in (^"/?%nl%/h%nl%/he%nl%/hel%nl%/help^") ^
do if /i "%~1"=="%%h" call :usage && goto end
if "%~2" neq "" (>&2 echo(too many arguments& (call) & goto end)

(call;)

setlocal enabledelayedexpansion
if /i "!cmdcmdline!" neq "!cmdcmdline:%comspec%  /s /d /c=!" ^
set "piped=1"
endlocal & set "piped=%piped%"

if "%~1" neq "" (if exist "%~1\" (>&2 echo("%~1" is a folder
(call) & goto end) else if not exist "%~1" (
>&2 echo(file "%~1" not found& (call) & goto end)
echo("%~1" | findstr "\* \?" >nul && (
>&2 echo(wildcards (* and ?^) not permitted& (call) & goto end)
if "%~z1"=="0" (>&2 echo(file "%~1" is empty& (call) & goto end)
if defined piped (
>&2 echo(specify input from pipe OR file--but not both& (call)
goto end)) else if not defined piped call :usage && goto end

call "%~dpf0" /re-enter "%~1" | findstr /v "^$"

:end
endlocal & goto :EOF

:win2unix
setlocal
if "%~1"=="" (set "file=") else set "file= "%~1""

for /f "delims=" %%i in ('findstr /n "^"%file%') do (
set "line=%%i"
setlocal enabledelayedexpansion
set "line=!line:*:=!!lf!"
echo(!line!
endlocal
)

endlocal & exit /b 0

:usage
set ^"\n=^^^%lf%%lf%^%lf%%lf%^^"
cls & echo(Converts newlines from Windows to Unix.%nl%%\n%
Usage:%nl%%\n%
  %~n0 win.txt [^> unix.txt]%\n%
  %~n0 win.txt [^| command-name]%\n%
  command-name ^| %~n0 [^> unix.txt]%\n%
  command-name1 ^| %~n0 [^| command-name2]%nl%%\n%
where win.txt has ^<CR^>^<LF^> line-endings and unix.txt ^
uses ^<LF^> for end-of-line.%nl%%\n%
Notes:%\n%
- Writes to Standard Output ^(STDOUT^) by default.%\n%
- Input should be 8-bit ASCII.%\n%
- Null Character (ASCII 0) in input will corrupt output.%\n%
- Cannot process lines longer than approx 8kb.
exit /b 0

Discussion

I need to bring your attention to the following three items before I can explain how the program works:

  1. Queue revealed in this DosTips topic that (call;) sets the dynamic variable errorlevel to 0 and (call) sets it to 1. So if you’re wondering what all those empty call statements in the program are about, now you know. 🙂

  2. The arcane code used to determine whether input is from a pipe or file (the result is stored in the piped variable) was obtained from the alt.msdos.batch.nt newsgroup where wizards dwell…

  3. Input can’t be piped into a CALLed subroutine… at least, not directly. Jeb explains why and supplies a clever workaround in this SO thread. Using this technique, win2unix.cmd avoids the need for temporary files.

Briefly, input is split into lines and stripped of line endings inside the in (...) clause of the for /f loop used by the :win2unix subroutine. Next, Line Feed (LF) is appended to every line of input by the set "line=!line:*:=!!lf!" command. Then echo(!line! adds a Windows newline pair (CR+LF) when it outputs the line. Lastly, findstr /v "^$" strips away the CR+LF leaving a single LF at the end of every line of output. Tada! Output is left with Unix line endings. Pretty simple, really. The hard part was making it work. 😉

It would have been simpler and probably more robust if I’d used a temporary file, but I was aiming for maximum portability. Some environments (especially in the corporate world) can be very restrictive indeed and ordinary users mightn’t even be allowed to create temporary files.

As always, please feel free to leave a comment with your thoughts and suggestions.

Related Links

  • Q139 of Prof. Timo Salmi’s Batch FAQ.

  • The excellent Dos2Unix utility is freeware and open source. Available for multiple platforms. Features include:

    • converts newlines to/from Windows, Unix, and even Mac (MacOS 9 or earlier)
    • edits files in place or makes back-ups
    • accepts wildcards from the command line
    • processes entire directory trees with one command
    • automatically skips binary files

    and many more! This tool does it all.

  • This post evolved from a DosTips topic started by Yours Truly.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s