2012-02-03

File System和MTD技术

本節介紹File System和MTD技術

一 FS
熟知的FS有ext2,3,4.但是這些都是針對磁盤設備的。而ES中一般的存儲設備為Flash,由於Flash的特殊性:
  • Flash存儲按照Block size進行劃分,而一個BLS一般有幾十K。(對比磁盤的一個簇才512個字節)。這麼大的BLS有什麼壞處呢?很明顯,擦除一個BL就需要花費很長的時間了。
  • 另外,FLASH操作,一次必須針對一個BL,也就是如果我想修改一個字節的話,也必須先擦掉128K。這不是想死嗎?
  • FLASH每個BL擦寫次數是有限的,例如2百萬次?如果每次都操作同一個BL的話,很快這個BL就會死掉。
所以,針對FLASH的特性,整出了一個Journaling Flash File System(JFFS2,第二版)。它的特點就是:
  • 損耗均衡,也就是每次擦寫都不會停留在一個BL上。例如BL1上寫一個文件,那麼再來一個新文件的時候,FS就不會選擇BL1了,而是選擇BL2。這個技術叫weal leveling:損耗均衡
(apt-get install mtd-tools,下載不少好工具,另外,可見往flash設備寫log,可能會導致flash短命喔)
一些偽文件系統:proc/sysfs,systool工具用來分析sysfs。
ES中常用的FS還有ramfs/rootfs/tmpfs,都是基於內存的FS。這個和前面講的initramfs有什麼關係?實際上這些是基於Initramfs的。
這裡要解釋一下幾個比如容易混淆的東西:
  • ram disk:這個是一個基於ram的disk,用來模擬block設備的,所以它是建立在一個虛擬的BLOCK設備上的,至於上面的FS用什麼,無所謂.這樣就引起效率等的問題。畢竟你的read等操作還是要傳遞到驅動的,並且如果該設備上構造了一個EXT2 FS的話,還得有相應的ext2-FS模塊。麻煩..
  • ramfs:這是一個基於內存的FS,而不是一個基於真實設備的FS。ramfs的特點就是用戶可以死勁寫內存,直到把系統內存搞空。
  • 為了控制一下上面這個缺點,引出了tmpfs,可以指定tmpfs的size。(這個又好像和真實設備有點類似,因為真實設備的存儲空間也是有限的)
  • rootfs是一種特殊的ramfs或者tmpfs(看LK中是否啟用了tmpfs),另外,rootfs是不能被umount的
下面介紹一下如何利用mount loop創建一個虛擬的基於文件的BLOCK設備。
  • 先創建一個全是0的文件,利用dd命令:dd if=/dev/zero of=../fstest bs=1024 count=512 這個解釋如下:從if中拷貝數據到of,每次拷貝字節為1024,拷貝總次數為512. 各位可用十六制工具看看,生成的文件裡邊全是0X00
  • 在這個文件中創建FS,mkfs.ext2fs ../fstest。現在,FS就存在於這個文件了。其實FS也就是一些組織結構,例如superblock,inode等信息
  • 如何把這個帶有FS信息的文件掛載呢?其實也就是如何把這個文件當做一個Block device呢?利用mount的loop選項,mount -t ext2 -o loop fstest /tmp/。這樣這個文件就被當做一個虛擬Block設備掛載到tmp了。
二 MTD技術
MTD全稱是Memory Technology Device,內存技術設備?實際上是一個虛擬設備驅動層,類似Virtual File System。它提供標準API給那些操作Raw Flash的device driver。那麼Flash device和普通的Block device的區別是什麼呢?
  • 普通的BLD只有兩種操作:read和write
  • 而Flash Device有三種操作:read,write和erase,另外,還需要一種wear leveling算法來做損耗均衡
這裡要重點指出的是:
SD/MMC卡、CF(Compact Flash)卡、USB Flash等並不是MTD設備,因為這些設備中已經有一個內置的Flash Translation Layer,這個layer處理erase、wear leveling事情了(這個TL應該是固件中支持的)。所以這些設備直接當做普通的Block Device使用
(上面的描述還是沒能說清楚MTD到底是怎麼用的,以後會結合源碼分析一下)

2.1 內核中啟用MTD支持
這個很簡單,make menuconfig的時候打開就行了,有好幾個選項。
圖1 LK中MTD支持的配置選項
其中:
  • MTC_CHAR和MTD_BLOCK用來支持CHAR模式和BLOCK模式讀寫MTD設備。這個和普通的char以及block設備意思一樣
  • 最後兩個是在內核中設置一個MTD test驅動。8192K用來設置總大小,第二個128用來設置block size。就是在內核中搞了一個虛擬的Flash設備,用作測試
ES中又如何配置MTD及其相關的東西呢?
  • 為Flash Disk設置分區(也可以整個Device就一個分區。BTW,我一直沒徹底搞清楚分區到底是想幹什麼,這個可能是歷史原因啦....)
  • 設置Flash的類型以及location。Flash設備分為NOR和NAND,本節最後會簡單介紹下二者的區別。
  • 為Flash芯片選擇合適的driver
  • 為LK配置driver
下面先看看分區的設置
可對Flash分區,這裡有一些稍微重要的內容:如何把Flash分區的信息傳遞給LK呢?有兩種方法:
  • 將整個device的分區情況存在一個BLock中,這樣BootLoader啟動的時候,根據這個BLock中的內容建立相應信息等。好像只有Red Boot支持。所以叫RedBoot Partition Table。另外,LK可以識別這種分區,通過CFI(Command Flash Interface)讀取這個分區的信息。
  • Kernel Command Line Partitioning:通過Kernel啟動的時候傳入參數,不過KL必須配置一下。Command格式如下:
圖2
再看看Driver的Mapping,也就是將MTD和對應的Flash Driver配對...
kernel/drivers/mtd/maps......,以後要分析
Flash芯片本身的Driver呢?
kernel/drivers/mtd/chips,目前比較流行的是CFI接口
三 一些參考資料和補充知識
MTD的本意是:
We're working on a generic Linux subsystem for memory devices, especially Flash devices.
The aim of the system is to make it simple to provide a driver for new hardware, by providing a generic interface between the hardware drivers and the upper layers of the system.
Hardware drivers need to know nothing about the storage formats used, such as FTL, FFS2, etc., but will only need to provide simple routines for read, write and erase. Presentation of the device's contents to the user in an appropriate form will be handled by the upper layers of the system.
MTD overview
MTD subsystem (stands for Memory Technology Devices) provides an abstraction layer for raw flash devices. It makes it possible to use the same API when working with different flash types and technologies, e.g. NAND, OneNAND, NOR, AG-AND, ECC'd NOR, etc.
MTD subsystem does not deal with block devices like MMC, eMMC, SD, CompactFlash, etc. These devices are not raw flashes but they have a Flash Translation layer inside, which makes them look like block devices. These devices are the subject of the Linux block subsystem, not MTD. Please, refer to this FAQ section for a short list of the main differences between block and MTD devices. And the raw flash vs. FTL devices UBIFS section discusses this in more details.
MTD subsystem has the following interfaces.
  • MTD character devices - usually referred to as /dev/mtd0, /dev/mtd1, and so on. These character devices provide I/O access to the raw flash. They support a number of ioctl calls for erasing eraseblocks, marking them as bad or checking if an eraseblock is bad, getting information about MTD devices, etc. /dev/mtdx竟然是char device!!
  • The sysfs interface is relatively newer and it provides full information about each MTD device in the system. This interface is easily extensible and developers are encouraged to use the sysfs interface instead of older ioctl or /proc/mtd interfaces, when possible.
  • The /proc/mtd proc file system file provides general MTD information. This is a legacy interface and the sysfs interface provides more information.
    MTD subsystem supports bare NAND flashes with software and hardware ECC, OneNAND flashes, CFI (Common Flash Interface) NOR flashes, and other flash types.
Additionally, MTD supports legacy FTL/NFTL "translation layers", M-Systems' DiskOnChip 2000 and Millennium chips, and PCMCIA flashes (pcmciamtd driver). But the corresponding drivers are very old and not maintained very much.
MTD Block Driver:
The mtdblock driver available in the MTD is an archaic tool which emulates block devices on top of MTD devices. It does not even have bad eraseblock handling, so it is not really usable with NAND flashes. And it works by caching a whole flash erase block in RAM, modifying it as requested, then erasing the whole block and writing back the modified. This means that mtdblock does not try to do any optimizations, and that you will lose lots of data in case of power cuts. And last, but not least, mtdblock does not do any wear-leveling.
Often people consider mtdblock as general FTL layer and try to use block-based file systems on top of bare flashes using mtdblock. This is wrong in most cases. In other words, please, do not use mtdblock unless you know exactly what you are doing.There is also a read-only version of this driver which doesn't have the capacity to do the caching and erase/writeback, mainly for use with uCLinux where the extra RAM requirement was considered too large
These are the modules which provide interfaces that can be used directly from userspace. The user modules currently planned include:
  • Raw character access: A character device which allows direct access to the underlying memory. Useful for creating filesystems on the devices, before using some of the translation drivers below, or for raw storage on infrequently-changed flash, or RAM devices.
  • Raw block access :A block device driver which allows you to pretend that the flash is a normal device with sensible sector size. It actually works by caching a whole flash erase block in RAM, modifying it as requested, then erasing the whole block and writing back the modified data.This allows you to use normal filesystems on flash parts. Obviously it's not particularly robust when you are writing to it - you lose a whole erase block's worth of data if your read/modify/erase/rewrite cycle actually goes read/modify/erase/poweroff. But for development, and for setting up filesystems which are actually going to be mounted read-only in production units, it should be fine. There is also a read-only version of this driver which doesn't have the capacity to do the caching and erase/writeback, mainly for use with uCLinux where the extra RAM requirement was considered too large.
  • Flash Translation Layer (FTL):NFTL,Block device drivers which implement an FTL/NFTL filesystem on the underlying memory device. FTL is fully functional. NFTL is currently working for both reading and writing, but could probably do with some more field testing before being used on production systems.
  • Journalling Flash File System, v2:This provides a filesystem directly on the flash, rather than emulating a block device. For more information, see sources.redhat.com.
  • MTD hardware device drivers

    These provide physical access to memory devices, and are not used directly - they are accessed through the user modules above.

    On-board memory:Many PC chipsets are incapable of correctly caching system memory above 64M or 512M. A driver exists which allows you to use this memory with the linux-mtd system.
  • PCMCIA devices:PCMCIA flash (not CompactFlash but real flash) cards are now supported by the pcmciamtd driver in CVS.
  • Common Flash Interface (CFI) onboard NOR flash:This is a common solution and is well-tested and supported, most often using JFFS2 or cramfs file systems.
  • Onboard NAND flash:NAND flash is rapidly overtaking NOR flash due to its larger size and lower cost; JFFS2 support for NAND flash is approaching production quality.
  • M-Systems' DiskOnChip 2000 and Millennium:The DiskOnChip 2000, Millennium and Millennium Plus devices should be fully supported, using their native NFTL and INFTL 'translation layers'. Support for JFFS2 on DiskOnChip 2000 and Millennium is also operational although lacking proper support for bad block handling.
這裡牽扯到NOR和NAND,那麼二者有啥區別呢?
Beside the different silicon cell design, the most important difference between NAND and NOR Flash is the bus interface. NOR Flash is connected to a address / data bus direct like other memory devices as SRAM etc. NAND Flash uses a multiplexed I/O Interface with some additional control pins. NAND flash is a sequential access device appropriate for mass storage applications, while NOR flash is a random access device appropriate for code storage application. NOR Flash can be used for code storage and code execution. Code stored on NAND Flash can't be executed from there. It must be loaded into RAM memory and executed from there.
  • NOR可以直接和CPU相連,就好像內存一樣。NAND不可以,因為NAND還需要別的一些I/O控制接口。所以NAND更像磁盤,而NOR更像內存
  • NOR比NAND貴,並且,NAND支持順序讀取,而NOR支持隨機讀取。
  • 所以,NOR中可存儲代碼,這樣CPU直接讀取就在其中運行。NAND不可以(主要還是因為CPU取地址的時候不能直接找到NAND)