smartctl-badblock

如何使用smartctl確認bad block

建議，若遇到bad block還是買新的HDD來備份資料
sudo dd if=/dev/sda of=/dev/sdb conv=sync,noerror
/dev/sda (bad block)
/dev/sdb (good hd)

更新記錄

item	note
20170629	第一版

smartctl

取得硬碟測試資料

smartctl -l selftest /dev/sda
-l : show device log
要有slfttest的log需要先測試，才能取得資料

# smartctl -l selftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      9787         191105024
# 2  Short offline       Completed: read failure       90%      9786         191105024
# 3  Extended offline    Completed: read failure       90%      9579         191105030
# 4  Extended offline    Completed: read failure       90%      9578         191105024
# 5  Short offline       Completed: read failure       90%      9578         191105026
# 6  Conveyance offline  Completed: read failure       90%      9578         191105028
# 7  Conveyance offline  Completed: read failure       90%      9578         191105030
# 8  Extended offline    Completed: read failure       90%      9578         191105024

發現有許多LBA error
LBA: Logical Block Address
The LBA counts sectors in units of 512 bytes, and starts at zero

smartctl -A /dev/sda

# smartctl -A /dev/sda
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       4699
  3 Spin_Up_Time            0x0027   190   175   021    Pre-fail  Always       -       3466
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1531
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       11
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       9807
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1474
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       1462
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       68
194 Temperature_Celsius     0x0022   109   098   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       5
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

找出partition位置
以第191105024 sector bad block而言
191105024 - 1 = 191105023 sector (
在/dev/sda1裡面的第191105023個sector為此bad block

# fdisk -lu /dev/sda

Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks  Id System
/dev/sda1               1  3907029167  1953514583+ ee EFI GPT
#

tune2fs -l /dev/sda1
沒有找到superblock
看來之前被後用dd清除

# ./tune2fs -l /dev/sda1
tune2fs 1.41.11 (14-Mar-2010)
./tune2fs: Bad magic number in super-block while trying to open /dev/sda1
Couldn't find valid filesystem superblock.
# 

不同端板，其它顆HDD
(none) :[/tmp/tt]# ./tune2fs -l /dev/sda1 | grep Block
Block count:              976629504
Block size:               4096
Blocks per group:         32768

參考之前格式化硬碟B使用B(Block size:4096)
b = 191105023*512/4096 = 47776255.75

清除bad block

1
2
3

root]# 
dd if=/dev/zero of=/dev/sda1 bs=4096 count=1 seek=47776255
root]# sync

清除bad block範例

以第191105024 sector為例子
191105024/4 = 47776256

1 2	Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 9787 191105024

確認存在 bad block

# dd if=/dev/sda1 count=1 bs=4096 skip=47776256
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000000
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/08:00:00:18:c8/00:00:16:00:00/40 tag 0 ncq 4096 in
         res 41/40:00:00:18:c8/00:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1: EH complete
xxx
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]  Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda]  Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
        16 c8 18 00 
sd 0:0:0:0: [sda]  ASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 16 c8 18 00 00 00 08 00
end_request: I/O error, dev sda, sector 382212096
Buffer I/O error on device sda1, logical block 47776256
ata1: EH complete
dd: /dev/sda1: Input/output error

# dd if=/dev/zero of=/dev/sda1 count=1 bs=4096 seek=47776256
1+0 records in
1+0 records out
4096 bytes (4.0KB) copied, 0.000665 seconds, 5.9MB/s

確認正常
但需要在使用smartctl -t long /dev/sda測試一次

# dd if=/dev/sda1 count=1 bs=4096 skip=47776256
1+0 records in
1+0 records out
4096 bytes (4.0KB) copied, 0.000419 seconds, 9.3MB/s

參考

Bad block HOWTO for smartmontools
How to remove Bad Sectors from HD
Linux 上處理壞軌硬碟的兩三事
- 硬碟失效關係最高的數值是:
  Reallocated Sectors Count
  Reallocations event count
  Current Pending Sector Count
  Uncorrectable Sector Count
  這幾組數據代表硬碟發現該磁區已經損毀

更新記錄

目錄

smartctl

取得硬碟測試資料

清除bad block範例

參考