How does the Windows Command Interpreter (CMD.EXE) parse scripts?

Answer

When invoking a command from a command window, tokenization of the command line arguments is not done by cmd.exe (a.k.a. "the shell"). Most often the tokenization is done by the newly formed processes' C/C++ runtime, but this is not necessarily so -- for example, if the new process was not written in C/C++, or if the new process chooses to ignore argv and process the raw commandline for itself (e.g. with GetCommandLine()). At the OS level, Windows passes command lines untokenized as a single string to new processes. This is in contrast to most *nix shells, where the shell tokenizes arguments in a consistent, predictable way before passing them to the newly formed process. All this means that you may experience wildly divergent argument tokenization behavior across different programs on Windows, as individual programs often take argument tokenization into their own hands.

If it sounds like anarchy, it kind of is. However, since a large number of Windows programs do utilize the Microsoft C/C++ runtime's argv, it may be generally useful to understand how the MSVCRT tokenizes arguments. Here is an excerpt:

    Arguments are delimited by white space, which is either a space or a tab.
    A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument. Note that the caret (^) is not recognized as an escape character or delimiter.
    A double quotation mark preceded by a backslash, \", is interpreted as a literal double quotation mark (").
    Backslashes are interpreted literally, unless they immediately precede a double quotation mark.
    If an even number of backslashes is followed by a double quotation mark, then one backslash () is placed in the argv array for every pair of backslashes (\), and the double quotation mark (") is interpreted as a string delimiter.
    If an odd number of backslashes is followed by a double quotation mark, then one backslash () is placed in the argv array for every pair of backslashes (\) and the double quotation mark is interpreted as an escape sequence by the remaining backslash, causing a literal double quotation mark (") to be placed in argv.

The Microsoft "batch language" (.bat) is no exception to this anarchic environment, and it has developed its own unique rules for tokenization and escaping. It also looks like cmd.exe's command prompt does do some preprocessing of the command line argument (mostly for variable substitution and escaping) before passing the argument off to the newly executing process. You can read more about the low-level details of the batch language and cmd escaping in the excellent answers by jeb and dbenham on this page.

Let's build a simple command line utility in C and see what it says about your test cases:

int main(int argc, char* argv[]) {
    int i;
    for (i = 0; i < argc; i++) {
        printf("argv[%d][%s]\n", i, argv[i]);
    }
    return 0;
}

(Notes: argv[0] is always the name of the executable, and is omitted below for brevity. Tested on Windows XP SP3. Compiled with Visual Studio 2005.)

> test.exe "a ""b"" c"
argv[1][a "b" c]

> test.exe """a b c"""
argv[1]["a b c"]

> test.exe "a"" b c
argv[1][a" b c]

And a few of my own tests:

> test.exe a "b" c
argv[1][a]
argv[2][b]
argv[3][c]

> test.exe a "b c" "d e
argv[1][a]
argv[2][b c]
argv[3][d e]

> test.exe a \"b\" c
argv[1][a]
argv[2]["b"]
argv[3][c]

All windows Questions

Ask your interview questions on windows

Write Your comment or Questions if you want the answers on windows from windows Experts
Name* :
Email Id* :
Mob no* :
Question
Or
Comment* :
 





Disclimer: PCDS.CO.IN not responsible for any content, information, data or any feature of website. If you are using this website then its your own responsibility to understand the content of the website

--------- Tutorials ---