What happens when you type `ls -l *.c` in the shell
For us to be able to type ls *.c in the shell terminal there must be a shell terminal. Sound logical, but there is a lot more to it, let us break it apart to understand it. The shell terminal is this being where we write a command and the magic happens. The shell is defined as the outermost layer of an operating system, which stands in direct contrast to the kernel, the innermost part of the operating system. The shell is the part we humans can interact with. This layer listens to the user through commands and tells the machine what to do by executing system calls.
A shell does three main things:
- Initialize: In this step, a typical shell would read and execute its configuration files. These change aspects of the shell’s behavior.
- Interpret: Next, the shell reads commands from the standard input, that is the line prompt, and executes them.
- Terminate: After its commands are executed, the shell executes any shutdown commands, frees up any memory, and terminates.
Most of the wizardry takes place the Interpret part, implemented as an “infinite” loop. The loop does three things:
- Read: Read the command from standard input.
- Parse: Separate the command string into a program and arguments.
- Execute: Run the parsed command.
Let us go back to our ls command and frame it within what we just said. ls stands for “list”, and according to the manual page, it lists information about the FILEs (the current directory by default). Sort entries alphabetically if none of -cftuvSUX nor — sort is specified (this are the flags or command arguments”. So if you type ls in your command line you would get:
Pretty straight forward. But there is a lot more happening behind the curtains. Let us go one by one, and keep in mind the acronym E BAB PC PC.
The * (star expansion )expansion
The asterisk or star wildcard (*) matches the characters that are placed after and tells the program to look for files that match those same characters. When, as in our case, one places it before ‘.c’, you are telling the program to only list files that end with a ‘.c’, or c files.
Wildcards can also be used to match a character’s range or a certain type of characters. To write a program that starts with x,y, or z, you can write [xyz]. Or, if you want to write a program that ends only with a lowercase letter, the input would be *[[:lower:]].
Notice how in the next example only the files ending with .c are displayed
Break the command into tokens
We enter a string a such, but for the execution the shell needs the different parts of this line separated. So the shell tokenizes it. That is, it separates every word and puts it into an array of strings.
One can implement this behaviour with a function similar to getline and strtok.
Check for aliases
Bash aliases allow you to set a shortcut command for a longer command. A bash alias has the following structure:
alias [alias_name]=”[command_to_alias]”
It always starts on a new line with the alias keyword. You define the shortcut command you want to use with the alias name, followed by an equal sign. So under the hood, the shell will exchange the entered command as an alias for the actual command with the respective arguments for proper execution.
Check builtins
First, lest clarify what Builtins are. Builtins are commands or functions called from the shell. These functions are executed directly within the shell in the present process instance and are faster than external programs. If the program determines the user input is not a builtin, by comparing the command with an array of the builtin names, it then moves to the next process, which is looking for the PATH.
Finds the command in the PATH
It copies the environment, which is an “array of strings” (AKA a double-pointer), One of this strings is the PATH. Which as such is a line starting with “PATH=” followed by the addresses of all the directories where the executable files are located separated by colons. The first part has to be removed and then the addresses had to be parsed or tokenize to form another array of strings with each separated possible path. The shell then adds or concatenate the command to the path and check for access permissions for the given file. Here is clear that a command is also actually the name of an executable file, that is, of a program to be executed. This check is done until the file matching the name with the right permissions is found or the array is finished, marked by a NULL string.
In our ls case the shell will start looking for the program file named ls, in the posibble addreses stores in the PATH.
Call the program ls with all the filename ending with .c as parameter
To call or execute a program means to do the following syscalls:
fork/execve/wait
fork creates a child process. When this function is called, the operating system duplicates the process currently running, both start to run concurrently. To tell them apart, the original process is called the “parent”, and the new one is called the “child”. fork() returns 0 to the child process, and it returns to the parent the process ID number (PID) of its child. So practically the only way for new processes is to start is by an existing one duplicating itself.
execve does the actual execution of the program. Normally one just does not want to run two copies of the same. So the exec families of commands replaces the current running program (parent) with an entirely new one (child). This means that when you call exec, the operating system stops your process, loads up the new program, and starts that one in its place. A process never returns from an exec() call (unless there’s an error).
wait stops the parent process while the child process, which is executing the program finishes.
As a side note, build-ins do not go into this process, they are executed within the parent process. For example, a call to exit within a child process would exit the child process, but we need that it exits the actual shell, that why it has to be called from the parent process.
When ls is done, print the prompt
The prompt is just the character or characters that signal us that the terminal is ready to take a new command. It is printed in the Stdout, also known as standard output, is the default file descriptor where a process can write output. stdout is defined by the POSIX standard. Its default file descriptor number is 1.
Wait for a new command to be entered
Here the loop is started again so the shell is waiting for the user input in the prompt.
That is it. Next time somebody asks you “What happens when you type ls *.c in the shell command prompt”, remember our akronomim. E BAB PC PC, expand, break, alias, built-ins, path, call, prompt and command. It begins with an expansion (think Big-Bang), almost has a BABY and gets two PCs. Next time you do the usual ls, you will also know what is really happening.
Notes
- For an explanation of the actual code take a look at Stephen Brennan Tutorial — Write a Shell in C.