Large File Support in Native Client

Summary

This document explains how large file support is implemented in Native Client version of glibc.

Objective

Our objective is to provide functions that work with large files and remove legacy functions that work with 32-bit file sizes. Moreover, file functions should not have ‘64’ suffix.

Background

In 32 bit (Linux) world interfaces with 32-bit offsets and file sizes were exposed by glibc. Then the problem of creating files large than 2Gb arose. Large File System (LFS) were created to answer this problem. For each file function there is a function with suffix ‘64’ appended to the name that works with large files (for example, both functions are exported from glibc: stat() and stat64()). See more information on source-level LFS support at gnu.org.

x86-32

If a 32-bit program defines _LARGEFILE64_SOURCE then both functions became available in the program. If the program defines _FILE_OFFSET_BITS to be equal 64 then usual function will be mapped to 64-bit one and would work with large files. Mapping is done using assembler labels: void stat() __asm__(“stat64”);. If both are defined then both functions will be available and both will be mapped to 64-bit one and will work with large files.

x86-64

64-bit programs have 64-bit types for file size by default. If _LARGEFILE64_SOURCE is defined, the program will have two identical sets of functions and types. If _FILE_OFFSET_BITS is defined to be 64, then we will have one set of functions (without suffix) that are mapped to functions with ‘64’ suffix. So glibc supports both types of functions even in 64-bit mode. Unlike 32-bit mode, these two types of functions are identical.
Glibc itself should support all modes so it compiles with _LARGEFILE64_SOURCE enabled and so uses and exports both usual and 64-bit functions and types.

New Clean Interface

Most modern programs use _FILE_OFFSET_BITS=64 and so functions with ‘64’ suffix are a legacy which we could avoid. In the Native Client the program uses 64-bit file functions that are exported by their usual name (without ‘64’ suffix):
We still have both stat and stat64 in glibc itself because it heavily uses both of them. The scheme is achived using combinations of aliases and assembler names. Unfortunately, we need to consider two cases now stat.o and stat.os.
stat.o case (lines represent assembler names):

stat.os case (dashed lines represent aliases):
Comments