如何使用smartctl確認bad block
建議,若遇到bad block還是買新的HDD來備份資料
sudo dd if=/dev/sda of=/dev/sdb conv=sync,noerror
/dev/sda (bad block)
/dev/sdb (good hd)
更新記錄
item | note |
---|---|
20170629 | 第一版 |
目錄
smartctl
取得硬碟測試資料
- smartctl -l selftest /dev/sda
-l : show device log
要有slfttest的log需要先測試,才能取得資料1
2
3
4
5
6
7
8
9
10
11
12
13
14
15# smartctl -l selftest /dev/sda
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9787 191105024
# 2 Short offline Completed: read failure 90% 9786 191105024
# 3 Extended offline Completed: read failure 90% 9579 191105030
# 4 Extended offline Completed: read failure 90% 9578 191105024
# 5 Short offline Completed: read failure 90% 9578 191105026
# 6 Conveyance offline Completed: read failure 90% 9578 191105028
# 7 Conveyance offline Completed: read failure 90% 9578 191105030
# 8 Extended offline Completed: read failure 90% 9578 191105024
發現有許多LBA error
LBA: Logical Block Address
The LBA counts sectors in units of 512 bytes, and starts at zero
smartctl -A /dev/sda
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25# smartctl -A /dev/sda
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 4699
3 Spin_Up_Time 0x0027 190 175 021 Pre-fail Always - 3466
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1531
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 11
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 9807
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1474
192 Power-Off_Retract_Count 0x0032 199 199 000 Old_age Always - 1462
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 68
194 Temperature_Celsius 0x0022 109 098 000 Old_age Always - 38
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 5
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0找出partition位置
以第191105024 sector bad block而言
191105024 - 1 = 191105023 sector (
在/dev/sda1裡面的第191105023個sector為此bad block1
2
3
4
5
6
7
8
9# fdisk -lu /dev/sda
Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 3907029167 1953514583+ ee EFI GPT
#tune2fs -l /dev/sda1
沒有找到superblock
看來之前被後用dd清除1
2
3
4
5
6
7
8
9
10
11# ./tune2fs -l /dev/sda1
tune2fs 1.41.11 (14-Mar-2010)
./tune2fs: Bad magic number in super-block while trying to open /dev/sda1
Couldn't find valid filesystem superblock.
#
不同端板,其它顆HDD
(none) :[/tmp/tt]# ./tune2fs -l /dev/sda1 | grep Block
Block count: 976629504
Block size: 4096
Blocks per group: 32768
參考之前格式化硬碟B使用B(Block size:4096)
b = 191105023*512/4096 = 47776255.75
- 清除bad block
1
2
3root]#
dd if=/dev/zero of=/dev/sda1 bs=4096 count=1 seek=47776255
root]# sync
清除bad block範例
以第191105024 sector為例子
191105024/4 = 477762561
2Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 9787 191105024確認存在 bad block
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23# dd if=/dev/sda1 count=1 bs=4096 skip=47776256
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000000
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/08:00:00:18:c8/00:00:16:00:00/40 tag 0 ncq 4096 in
res 41/40:00:00:18:c8/00:00:16:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
ata1: EH complete
xxx
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
16 c8 18 00
sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB: cdb[0]=0x28: 28 00 16 c8 18 00 00 00 08 00
end_request: I/O error, dev sda, sector 382212096
Buffer I/O error on device sda1, logical block 47776256
ata1: EH complete
dd: /dev/sda1: Input/output errordd
1
2
3
4# dd if=/dev/zero of=/dev/sda1 count=1 bs=4096 seek=47776256
1+0 records in
1+0 records out
4096 bytes (4.0KB) copied, 0.000665 seconds, 5.9MB/s確認正常
但需要在使用smartctl -t long /dev/sda測試一次1
2
3
4# dd if=/dev/sda1 count=1 bs=4096 skip=47776256
1+0 records in
1+0 records out
4096 bytes (4.0KB) copied, 0.000419 seconds, 9.3MB/s
參考
- Bad block HOWTO for smartmontools
- How to remove Bad Sectors from HD
- Linux 上處理壞軌硬碟的兩三事
- 硬碟失效關係最高的數值是:
Reallocated Sectors Count
Reallocations event count
Current Pending Sector Count
Uncorrectable Sector Count
這幾組數據代表硬碟發現該磁區已經損毀
- 硬碟失效關係最高的數值是: