What the "file descriptor" is and the file system on OS

·

7 min read

When we edit a file or access data from a file in C/C++, we cannot avoid dealing with "file descriptor". Today, I will explain what a file descriptor and how to manipulate files using file descriptors.

TL;DR

A "file descriptor" (or "fd" in short) is an integer that uniquely identifies an open file. Each file is assigned a unique file descriptor by the open() function, which you can then use to read from and write to the file. In this sense, the file descriptor uniquely specifies a file, much like a file path does.

What is the File?

A file refers to a unit of storage data that can be texts, programs, audio, video, images, executable codes, and so forth. Therefore, .c, .cpp, .swift, .mp4, jpg, .out, and .pdf are all file.

Then, what is the File Descriptor?

When you manipulate files in your program, you have to open, read, write, and close them. And especially when you want to manipulate files on Unix-like systems (such as Linux and Mac) using C/C++, you have to get file descriptors for them. Let's see an example first:

#include <fcntl.h>
int main() {
    // path to the file
    char path[] = "example.txt";
    // get the file descriptor
    int fd = open(path, O_RDONLY);
}

Once a file is successfully opened you get a file descriptor, you can handle files in this way. The actual number of the file descriptor will be unsigned integer more than 2.

Important terms in the file system

Before diving into how to utilize the file descriptor, let me explain about the a few important terms of the file system for better understanding.

  1. File Descriptor Table

The file descriptor table is a per-process table that consists of an entry for each file descriptor. Every process has its own file descriptor table and the operating system assign an unoccupied index of a file descriptor table of each process when a file is opened. Therefore, even if the file descriptor of "example.txt" is 3 on the process "A", it does not mean the file descriptor of "example.txt" is also 3 on the process "B".

  1. Open File Table & File Table Entry

The open file table is a table that consists of entries for each opened file. Unlike the file descriptor table, the open file table is shared among all processes. Multiple file descriptors from different process can point to the same entry in the open file table if they have opened the same file.
Each entry in the open file table is called file table entry. A file table entry contains information specific to an open instance of a file, such as the current file position, the status flag, reference count, and the reference to the corresponding inode table entry.

Note

  • File Status Flag: represents if the file is opened in read-only mode, write-only mode, and or other mode.

  • File position(also known as file offset and file pointer): is a position where read or write handling start. It is initially set to 0. If 10 bytes are read on the file, the file position will be updated to 10, and the next manipulating like read and write start at 10.

  • Reference count: is a number of how many file descriptors reference to the file table entry. If 4 file descriptors reference to a file table entry, its reference count will be 4.

  1. Inode Table & Inode

The inode table is a system-wide table that contains an entry for each file in the file system. Each entry in the inode table is called inode (index node) that represents a specific file and contains metadata about the file such as the file size, permissions, ownership, timestamps, and the location of file's data on the storage.

In conclusion, the file system can be visualized like the following image:

Why is the file descriptor used rather than the file path?

You may think that the file path can also uniquely identifies files. However, it is desirable to use a file descriptor instead of a file path for several reasons:

  • Efficiency: Manipulating files using file path requires more steps than using a file descriptor. While handling files with file paths requires parsing and resolving the file paths, a file descriptor can directly refer to an open file.

  • Convenience: You can handle standard input, standard output, and standard error with read() and write() function with integers such as 0, 1, and 2 respectively. Because file descriptors are also integers you can handle files similarly.

  • Independence: File descriptors abstract away the detail of underlying file systems (see the visualization above). This allows programs to work seamlessly across different file systems and operating systems.

  • Low-level programming: File descriptors are fundamental to low-level system programming. Therefore, file descriptors are an essential way to manipulate files on the low-level operating system I/O file subsystems.

How to utilize file descriptors

As I mentioned above, you can open and get a file descriptor by open() function with a file path.

#include <fcntl.h>
int main() {
    char path[] = "example.txt";
    int fd = open(path, O_RDONLY);
}

The first argument in the open() function is the path to the file, and the second is the mode of opened file. Here are several useful modes:

  • O_RDONLY: The opened file is read-only.

  • O_WRONLY: The opened file is write-only.

  • O_RDWR: The opened file can be read and written.

  • O_APPEND: The file position is set to the end of the file, so you can add texts to the end of the file instead of overwriting it.

  • O_TRUNC: If the existing file is opened with this mode, the file is truncated to 0 bytes; the content of the file is discarded. If the file does not exist, it creates a new empty file. As a result, you can write texts on an empty file. This mode is often used in combination of O_RDWR or O_WRONLY like open(path, O_WRONLY | O_TRUNC.

  • O_CREAT: If the file already exists, the file is opened normally based on other the specified flags. If the file does not exist, it creates a new empty file. This mode is also often used in combination of other flags as O_TRUNC is.

Note

  • The modes are, in fact, an integer defined in "fcntl.h".

If the file is successfully opened, the open() function returns a file descriptor more than 2. If the file is failed to be opened, the open function returns "-1".

#include <fcntl.h>
int main() {
    char path[] = "example.txt";
    int fd = open(path, O_RDONLY);

    if (fd == -1) {
        // handle error
    }
}

Once you get a file descriptor, you can read and write contents by read() and write() functions. Let's see an example of reading contents in a file.

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

#define BUFFER_SIZE 100

int main() {
    char path[] = "example.txt";
    int fd = open(path, O_RDONLY);

    if (fd == -1) {
        // handle error
    }

    char buffer[BUFFER_SIZE];
    ssize_t bytesRead = read(fd, buffer, BUFFER_SIZE - 1);
    if (bytesRead == -1) {
        // error handling for failure of read
        perror("Failed to read from the file");
        close(fileDescriptor);
        return 1;
    }
    buffer[bytesRead] = '\0';

    // print the contents read
    printf("File contents:\n%s\n", buffer);
    // close file
    close(fd);

    return 0;
}

Note

  • The contents read by the read() function is stored into the char[] variable buffer, and it is null-terminated.

  • If the program fails to read the contents, the read() function returns "-1".

  • When manipulating a file is finished, you should close it with the close function.

Next, let's see an example of write:

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int main() {
    // Open the file for writing and truncate its content
    int fd = open("file.txt", O_WRONLY);
    if (fd == -1) {
        perror("open");
        return 1;
    }

    // Write new content to the file
    char *content = "New content";
    ssize_t bytesWritten = write(fd, content, strlen(content));
    if (bytesWritten == -1) {
        perror("write");
        close(fd);
        return 1;
    }

    // Close the file
    close(fd);

    return 0;
}

Note

  • The write function also returns "-1" when it fails to write contents.