遇到smartctl FAILED,了解錯誤原因
更新記錄
item | note |
---|---|
20170120 | 第一版 |
目錄
Smartctl Reallocated_Sector_Ct
there are physical issues already reported by S.M.A.R.T
表示已檢測到實体磁區損毀
When the hard drive finds a read/write/verification error, it marks that sector as “reallocated” and transfers data to a special reserved area (spare area)
硬碟driver會去做讀入驗證(即寫入資料,再讀回來驗證),若產生問題,將會使用硬碟內的spare area
RAW_VALUE
記錄實際損毀磁區
The raw value normally represents a count of the bad sectors that have been found and remapped.
- 下例記錄嚴重錯誤 (若超出安全的範圍)
id | note | note | x |
---|---|---|---|
1 | Read Error Rate | 底层数据读取错误率 | |
5 | Reallocated Sector Count | 重定位磁区计数 | 当高过一定数值后,后磁區消耗殆尽而无法再重映射修复时,这些坏磁區就会显现出来且无法自行修复 |
10 | Spin Retry Count | 电机起转重试 | 主轴电机频繁的尝试启动,意味着硬盘驱动器的寿命可能将近实际限值 |
196 | Reallocation Event Count | 重定位事件计数 | 记录已重映射扇区和可能重映射扇区的事件计数 |
197 | Current Pending Sector Count | 等候重定的扇区计数 | 记录了不稳定的扇区的数量 |
198 | Uncorrectable Sector Count | 无法校正的扇区计数 | 记录肯定出错的扇区数量 |
smartctl health
smartctl程序
開啟SMART
smartctl -s on -d ata /dev/sda1
2
3
4
5-s VALUE, --smart=VALUE
Enable/disable SMART on device (on/off)
-d TYPE, --device=TYPE
Specify device type to one of: ata, scsi, sat[,auto][,N][+TYPE], usbcypress[,X], usbjmicron[,p][,x][,N], usbsunplus, marvell, areca,N/E, 3ware,N, hpt,L/M/N, megaraid,N, cciss,N, auto, test每隔一段時間檢查health
smartctl -H /dev/sda -d auto1
2-H, --health
Show device SMART health status
smartctl health訊息
PASSED
1
2
3
4
5
6
7
8# smartctl -H /dev/sda -d auto
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
#FAILED
1
2
3
4
5
6
7
8
9
10# smartctl -H /dev/sdb -d auto
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 135 135 140 Pre-fail Always FAILING_NOW 27
smartctl test
Reallocated_Event_Count : 重定位事件计数,记录已重映射扇区和可能重映射扇区的事件计数
Current_Pending_Sector: 等候重定的扇区计数,记录了不稳定的扇区的数量
1 | 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 647 |
- smartctl -t short /dev/sdb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214# smartctl -t short /dev/sdb
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Jan 20 13:42:35 2017
Use smartctl -X to abort test.
# date
Fri Jan 20 13:41:07 UTC 2017
# date
Fri Jan 20 13:41:44 UTC 2017
# date
Fri Jan 20 13:43:13 UTC 2017
#
=========================================================================================
# smartctl -a /dev/sdb
smartctl 6.2 2013-07-26 r3841 [armv7l-linux-3.4.35_hi3535] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WDC WD10JUCT-63CYNY0
Serial Number: WD-WX31AC4PVPHA
LU WWN Device Id: 5 0014ee 65aae4265
Firmware Version: 01.01A01
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Fri Jan 20 13:43:41 2017 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 241) Self-test routine in progress...
10% of test remaining.
Total time to complete Offline
data collection: (18420) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 206) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x7035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 183 179 021 Pre-fail Always - 1833
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 441
5 Reallocated_Sector_Ct 0x0033 135 135 140 Pre-fail Always FAILING_NOW 2770
7 Seek_Error_Rate 0x002e 200 194 000 Old_age Always - 4
9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 9398
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 441
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 439
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 140
194 Temperature_Celsius 0x0022 114 099 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always - 647
197 Current_Pending_Sector 0x0032 001 001 000 Old_age Always - 65532
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
SMART Error Log Version: 1
ATA Error Count: 9762 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 9762 occurred at disk power-on lifetime: 9396 hours (391 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 24d+01:45:47.162 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 24d+01:45:47.162 IDENTIFY DEVICE
c8 00 08 00 00 00 e0 08 24d+01:45:47.128 READ DMA
ec 00 00 00 00 00 a0 08 24d+01:45:47.116 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 24d+01:45:47.116 SET FEATURES [Set transfer mode]
Error 9761 occurred at disk power-on lifetime: 9396 hours (391 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 08 00 00 00 e0 Device Fault; Error: ABRT 8 sectors at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 e0 08 24d+01:45:47.128 READ DMA
ec 00 00 00 00 00 a0 08 24d+01:45:47.116 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 24d+01:45:47.116 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 24d+01:45:47.115 IDENTIFY DEVICE
c8 00 08 00 00 00 e0 08 24d+01:45:47.079 READ DMA
Error 9760 occurred at disk power-on lifetime: 9396 hours (391 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 24d+01:45:47.116 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 24d+01:45:47.115 IDENTIFY DEVICE
c8 00 08 00 00 00 e0 08 24d+01:45:47.079 READ DMA
ec 00 00 00 00 00 a0 08 24d+01:45:47.071 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 24d+01:45:47.070 SET FEATURES [Set transfer mode]
Error 9759 occurred at disk power-on lifetime: 9396 hours (391 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 08 00 00 00 e0 Device Fault; Error: ABRT 8 sectors at LBA = 0x00000000 = 0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 00 00 00 e0 08 24d+01:45:47.079 READ DMA
ec 00 00 00 00 00 a0 08 24d+01:45:47.071 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 24d+01:45:47.070 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 24d+01:45:47.070 IDENTIFY DEVICE
c8 00 08 00 00 00 e0 08 24d+01:45:47.026 READ DMA
Error 9758 occurred at disk power-on lifetime: 9396 hours (391 days + 12 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
04 61 46 00 00 00 a0 Device Fault; Error: ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ef 03 46 00 00 00 a0 08 24d+01:45:47.070 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 24d+01:45:47.070 IDENTIFY DEVICE
c8 00 08 00 00 00 e0 08 24d+01:45:47.026 READ DMA
ec 00 00 00 00 00 a0 08 24d+01:45:47.015 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 24d+01:45:47.014 SET FEATURES [Set transfer mode]
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
#