├── sample_scripts ├── scsi_fault_injection_scripts.txt ├── md_scsi_fault_injection_test_timeout.sh ├── md_create_array.sh ├── md_scsi_fault_injection_test.sh ├── md_rerr_test_sample.sh ├── md_werr_test_sample.sh └── README.txt ├── disk_rwerr.stp ├── temporary_rerr.stp ├── sector_rerr.stp ├── temporary_werr.stp ├── disk_rerr.stp ├── rw_timeout.stp ├── w_timeout.stp ├── r_timeout.stp ├── fault_injection_common_sata ├── scsi_timeout_injection.stp └── scsi_fault_injection.stp ├── fault_injection_common_scsi ├── scsi_timeout_injection.stp └── scsi_fault_injection.stp ├── fault_injection_common_sata_raid56 ├── scsi_timeout_injection.stp └── scsi_fault_injection.stp ├── fault_injection_common_scsi_raid56 ├── scsi_timeout_injection.stp └── scsi_fault_injection.stp ├── doc └── sample_usage.txt ├── README.txt └── License.txt /sample_scripts/scsi_fault_injection_scripts.txt: -------------------------------------------------------------------------------- 1 | read temporary_rerr.stp 2 | read disk_rerr.stp 3 | read sector_rerr.stp 4 | read disk_rwerr.stp 5 | write temporary_werr.stp 6 | 7 | -------------------------------------------------------------------------------- /sample_scripts/md_scsi_fault_injection_test_timeout.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DISKTYPE=$1 4 | RAIDLEVEL=$2 5 | FILENAME=0 6 | DIRNAME='results_'$(date --rfc-3339=date)_timeout 7 | 8 | mkdir ./$DIRNAME 9 | 10 | #echo $tempresult 11 | grep read scsi_fault_injection_scripts.txt | cut -f 2 > readscripts 12 | grep write scsi_fault_injection_scripts.txt | cut -f 2 > writescripts 13 | 14 | 15 | echo "####" 16 | echo "raid $RAIDLEVEL" 17 | echo "####" 18 | 19 | for env in norm red deg recov 20 | do 21 | echo "raid $RAIDLEVEL status $env" 22 | 23 | echo " read timeout" 24 | date 25 | echo "*" 26 | echo "* r_timeout.stp on RAID$RAIDLEVEL $env condition" 27 | echo "*" 28 | FILENAME="$DISKTYPE-RAID$RAIDLEVEL-$env-r_timeout" 29 | sh md_rerr_test_sample.sh $DISKTYPE $RAIDLEVEL $env r_timeout.stp > ./$DIRNAME/$FILENAME.result 30 | 31 | 32 | echo " write timeout" 33 | date 34 | echo "*" 35 | echo "* w_timeout.stp on RAID$RAIDLEVEL $env condition" 36 | echo "*" 37 | FILENAME="$DISKTYPE-RAID$RAIDLEVEL-$env-w_timeout" 38 | sh md_werr_test_sample.sh $DISKTYPE $RAIDLEVEL $env w_timeout.stp > ./$DIRNAME/$FILENAME.result 39 | done 40 | 41 | rm readscripts 42 | rm writescripts 43 | rm stapresult.txt 44 | 45 | -------------------------------------------------------------------------------- /disk_rwerr.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 1 32 | access_type = 3 33 | } 34 | 35 | probe end 36 | { 37 | printf("\nDONE\n") 38 | } 39 | 40 | -------------------------------------------------------------------------------- /sample_scripts/md_create_array.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # 4 | # Befor running this script, you need /dev/sdb1, /dev/sdc1, /dev/sdd1, /dev/sde1 5 | # partitions with ID = 0xfd (Linux raid). They must be the same size. 6 | # 7 | 8 | 9 | GREPRESULT=0 10 | RESULT=1 11 | MDDEV=/dev/mdx 12 | RAIDTYPE=$1 13 | RAIDENV=$2 14 | 15 | echo "create an array" 16 | 17 | 18 | case $RAIDTYPE in 19 | "1") mdadm -C /dev/md0 --assume-clean -R -l1 -n2 /dev/sd[bc]1; RESULT=$?; MDDEV=/dev/md0;; 20 | "10") mdadm -C /dev/md0 --assume-clean -R -l10 -n4 /dev/sd[bcde]1; RESULT=$?; MDDEV=/dev/md0;; 21 | *) echo "illeagal request"; exit; 22 | esac 23 | 24 | # if raid array created, wait for initialization. 25 | if [ $RESULT -eq 0 ]; then 26 | 27 | while [ $GREPRESULT -eq 0 ] 28 | do 29 | cat /proc/mdstat | grep "resync" > /dev/null 30 | GREPRESULT=$? 31 | done 32 | 33 | echo "resync done" 34 | # create ext3 filesystem on the RAID device 35 | mkfs -t ext3 $MDDEV 36 | sleep 3 37 | fi 38 | 39 | 40 | case $RAIDENV in 41 | "norm") ;; 42 | "red") mdadm /dev/md0 -a /dev/sdd1 || mdadm /dev/md0 -a /dev/sdf1;; 43 | "deg" | "recov") mdadm /dev/md0 -f /dev/sdb1; \ 44 | sleep 1; \ 45 | mdadm /dev/md0 -f /dev/sde1; \ 46 | sleep 1;; 47 | *) echo "illeagal request"; exit; 48 | esac 49 | 50 | sleep 1 51 | 52 | # show the md status 53 | cat /proc/mdstat 54 | 55 | -------------------------------------------------------------------------------- /temporary_rerr.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 0 32 | access_type = 2 33 | } 34 | 35 | probe end 36 | { 37 | printf("\nDONE\n") 38 | } 39 | -------------------------------------------------------------------------------- /sector_rerr.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 1 32 | access_type = 4 33 | } 34 | 35 | probe end 36 | { 37 | printf("\nDONE\n") 38 | } 39 | 40 | -------------------------------------------------------------------------------- /temporary_werr.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 0 32 | access_type = 1 33 | } 34 | 35 | probe end 36 | { 37 | printf("\nDONE\n") 38 | } 39 | 40 | -------------------------------------------------------------------------------- /disk_rerr.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 1 32 | access_type = 2 33 | timeout_flag = 0 34 | } 35 | 36 | probe end 37 | { 38 | printf("\nDONE\n") 39 | } 40 | 41 | -------------------------------------------------------------------------------- /rw_timeout.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 1 32 | access_type = 3 33 | 34 | timeout_flag = 1 35 | timeout_period = 10000 36 | } 37 | 38 | probe end 39 | { 40 | printf("\nDONE\n") 41 | } 42 | 43 | -------------------------------------------------------------------------------- /w_timeout.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 0 32 | access_type = 1 33 | 34 | timeout_flag = 1 35 | timeout_period = 10000 36 | } 37 | 38 | probe end 39 | { 40 | printf("\nDONE\n") 41 | } 42 | 43 | -------------------------------------------------------------------------------- /r_timeout.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | probe begin 25 | { 26 | dev_major = $1 27 | dev_minor_min = $2 28 | dev_minor_max = $3 29 | inode_lba_flag = $4 30 | inode_lba_val = $5 31 | error_type = 0 32 | access_type = 2 33 | 34 | timeout_flag = 1 35 | timeout_period = 10000 36 | 37 | 38 | } 39 | 40 | probe end 41 | { 42 | printf("\nDONE\n") 43 | } 44 | 45 | -------------------------------------------------------------------------------- /sample_scripts/md_scsi_fault_injection_test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DISKTYPE=$1 4 | RAIDLEVEL=$2 5 | FILENAME=0 6 | DIRNAME='results_'$(date --rfc-3339=date) 7 | 8 | mkdir ./$DIRNAME 9 | 10 | #echo $tempresult 11 | grep read scsi_fault_injection_scripts.txt | cut -f 2 > readscripts 12 | grep write scsi_fault_injection_scripts.txt | cut -f 2 > writescripts 13 | 14 | 15 | echo "####" 16 | echo "raid $RAIDLEVEL" 17 | echo "####" 18 | 19 | for env in norm red deg recov 20 | 21 | do 22 | echo "raid $RAIDLEVEL status $env" 23 | 24 | echo " read err" 25 | while read scriptname 26 | do 27 | date 28 | echo "*" 29 | echo "* $scriptname on RAID$RAIDLEVEL $env condition" 30 | echo "*" 31 | FILENAME="$DISKTYPE-RAID$RAIDLEVEL-$env-$scriptname" 32 | sh md_rerr_test_sample.sh $DISKTYPE $RAIDLEVEL $env $scriptname > ./$DIRNAME/$FILENAME.result 33 | done < readscripts 34 | 35 | echo " write err" 36 | while read scriptname 37 | do 38 | date 39 | echo "*" 40 | echo "* $scriptname on RAID$RAIDLEVEL $env condition" 41 | echo "*" 42 | FILENAME="$DISKTYPE-RAID$RAIDLEVEL-$env-$scriptname" 43 | sh md_werr_test_sample.sh $DISKTYPE $RAIDLEVEL $env $scriptname > ./$DIRNAME/$FILENAME.result 44 | done < writescripts 45 | 46 | done 47 | 48 | rm readscripts 49 | rm writescripts 50 | rm stapresult.txt 51 | -------------------------------------------------------------------------------- /fault_injection_common_sata/scsi_timeout_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | %} 35 | 36 | global global_scmd 37 | global startsite 38 | global entrynext 39 | global timeoutfunction 40 | global restore_state 41 | global target_access_t 42 | global entire_retries 43 | global timeout_period 44 | 45 | function restore_val:long (cmd:long, entrynext:long, startsite:long, func:long, hoststs:long) 46 | %{ 47 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 48 | 49 | scmd->eh_timeout.data = (unsigned long)(long)THIS->cmd; 50 | scmd->eh_timeout.start_site = (void *)(long)THIS->startsite; 51 | scmd->eh_timeout.function = (void (*)(unsigned long))(long)THIS->func; 52 | scmd->eh_timeout.entry.next = (void *)(long)THIS->entrynext; 53 | scmd->device->host->shost_state = (int)(long)THIS->hoststs; 54 | scmd->result = 0; 55 | 56 | THIS->__retvalue = (unsigned int)scmd->result; 57 | %} 58 | 59 | function save_start_site:long (cmd:long) 60 | %{ 61 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 62 | THIS->__retvalue = (unsigned int)scmd->eh_timeout.start_site; 63 | %} 64 | 65 | function save_entry_next:long (cmd:long) 66 | %{ 67 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 68 | unsigned int tempval; 69 | tempval = (unsigned int)scmd->eh_timeout.entry.next; 70 | scmd->eh_timeout.entry.next = NULL; 71 | THIS->__retvalue = tempval; 72 | %} 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /sample_scripts/md_rerr_test_sample.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # 4 | GREPRESULT=1 5 | TESTDIR=/home/test 6 | TOOLDIR=".." 7 | TARGETINODE=0 8 | DEVTYPE=$1 9 | RAIDTYPE=$2 10 | RAIDENV=$3 11 | SCRIPTNAME=$4 12 | RAIDDEV=/dev/md0 13 | TESTFILENAME='targetfile_'$(date --rfc-3339=date) 14 | TARGETDEVS="8 16 81" 15 | 16 | if [ $RAIDENV = "recov" ]; then 17 | TARGETDEVS="8 33 81" 18 | fi 19 | 20 | 21 | if [ -z "$DEVTYPE" -o -z "$RAIDTYPE" -o -z "$SCRIPTNAME" ]; then 22 | echo "usage: sh $0 device(scsi or sata) raid(1 or 10) env scriptname" 23 | echo " e.g. sh $0 scsi 1 norm disk_rerr.stp" 24 | exit 25 | fi 26 | 27 | if [ $RAIDTYPE -ne 1 -a $RAIDTYPE -ne 10 ]; then 28 | echo "usage: sh $0 device(scsi or sata) raid(1 or 10)" 29 | echo " e.g. sh $0 scsi 1" 30 | exit 31 | fi 32 | 33 | # cleanup 34 | umount $TESTDIR 35 | mdadm -S $RAIDDEV 36 | mkdir /home/test 37 | 38 | # create an array to be tested 39 | echo "#############################" 40 | echo "# " 41 | echo "# Creating an array" 42 | echo "# " 43 | echo "#############################" 44 | sh ./md_create_array.sh $RAIDTYPE $RAIDENV 45 | 46 | # mount md array 47 | mount -t ext3 $RAIDDEV $TESTDIR 48 | sleep 5 49 | 50 | # create a target file for a fault injection and get its inode num 51 | dd if=/dev/urandom of=$TESTDIR/targetfile bs=1000 count=1000 52 | cp $TESTDIR/targetfile /tmp/$TESTFILENAME 53 | 54 | echo "test file to inject a fault created" 55 | 56 | TARGETINODE=$(ls -ail $TESTDIR | grep targetfile | awk '{print $1}') 57 | echo "TARGETINODE= $TARGETINODE" 58 | 59 | # 60 | # main routine 61 | # 62 | sync 63 | ls $TESTDIR -hil 64 | 65 | # purge cache 66 | echo 1 > /proc/sys/vm/drop_caches; 67 | sleep 2 68 | 69 | # Run the script 70 | echo "#############################" 71 | echo "# " 72 | echo "# Runing $SCRIPTNAME " 73 | echo "# " 74 | echo "#############################" 75 | stap $TOOLDIR/$SCRIPTNAME $TARGETDEVS 1 $TARGETINODE -g -I $TOOLDIR/fault_injection_common_$DEVTYPE/ -v | tee stapresult.txt & 76 | 77 | echo "waiting stap startup" 78 | sleep 1 79 | GREPRESULT=1 80 | while [ $GREPRESULT -eq 1 ] 81 | do 82 | grep "BEGIN" stapresult.txt > /dev/null 83 | GREPRESULT=$? 84 | done 85 | 86 | if [ $RAIDENV = "recov" ]; then 87 | # recovering /dev/sdb1 88 | mdadm /dev/md0 -r /dev/sdb1 89 | sleep 1 90 | mdadm /dev/md0 -a /dev/sdb1 91 | sleep 1 92 | cat /proc/mdstat 93 | fi 94 | 95 | echo "#############################" 96 | echo "# " 97 | echo "# Injecting a fault " 98 | echo "# " 99 | echo "#############################" 100 | date 101 | sleep 1 102 | cat $TESTDIR/targetfile > /tmp/targetfile_after 103 | sleep 5 104 | echo "Fault injected" 105 | 106 | # stop the script 107 | pkill stap 108 | sleep 10 109 | 110 | # print the result, see also the script printout 111 | echo "#############################" 112 | echo "# " 113 | echo "# After the fault injection. Show the results " 114 | echo "# " 115 | echo "#############################" 116 | sleep 5 117 | 118 | # compare the original file with the read file 119 | echo "verify the contents when a fault injected(by cmp command)" 120 | cmp /tmp/$TESTFILENAME /tmp/targetfile_after 121 | echo "cmp result = $?" 122 | tail -n 50 /var/log/messages 123 | cat /proc/mdstat 124 | 125 | # wait for recovery completion if needed 126 | cat /proc/mdstat | grep "recovery" > /dev/null 127 | GREPRESULT=$? 128 | if [ $GREPRESULT -eq 0 ]; then 129 | 130 | while [ $GREPRESULT -eq 0 ] 131 | do 132 | cat /proc/mdstat | grep "recovery" > /dev/null 133 | GREPRESULT=$? 134 | done 135 | echo "recovery done" 136 | cat /proc/mdstat 137 | fi 138 | 139 | # cleanup 140 | umount $RAIDDEV 141 | umount /media/disk 142 | mdadm -S $RAIDDEV 143 | rm -rf /home/test 144 | 145 | -------------------------------------------------------------------------------- /fault_injection_common_scsi/scsi_timeout_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | %} 35 | 36 | global global_scmd 37 | global startsite 38 | global entrynext 39 | global timeoutfunction 40 | global restore_state 41 | global target_access_t 42 | global entire_retries 43 | global timeout_period 44 | 45 | function restore_val:long (cmd:long, entrynext:long, startsite:long, func:long, hoststs:long) 46 | %{ 47 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 48 | 49 | scmd->eh_timeout.data = (unsigned long)(long)THIS->cmd; 50 | scmd->eh_timeout.start_site = (void *)(long)THIS->startsite; 51 | scmd->eh_timeout.function = (void (*)(unsigned long))(long)THIS->func; 52 | scmd->eh_timeout.entry.next = (void *)(long)THIS->entrynext; 53 | scmd->device->host->shost_state = (int)(long)THIS->hoststs; 54 | scmd->result = 0; 55 | 56 | THIS->__retvalue = (unsigned int)scmd->result; 57 | %} 58 | 59 | function save_start_site:long (cmd:long) 60 | %{ 61 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 62 | THIS->__retvalue = (unsigned int)scmd->eh_timeout.start_site; 63 | %} 64 | 65 | function save_entry_next:long (cmd:long) 66 | %{ 67 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 68 | unsigned int tempval; 69 | tempval = (unsigned int)scmd->eh_timeout.entry.next; 70 | scmd->eh_timeout.entry.next = NULL; 71 | THIS->__retvalue = tempval; 72 | %} 73 | 74 | 75 | probe module("*").function("scsi_delete_timer@drivers/scsi/scsi_error.c") { 76 | 77 | if(global_scmd == $scmd) 78 | { 79 | startsite = save_start_site($scmd) 80 | entrynext = save_entry_next($scmd) 81 | printf(" scsi_delete_timer: scmd= %d, startsite= %d, next= %d \n", $scmd, startsit, entrynext) 82 | } 83 | } 84 | 85 | probe module("*").function("scsi_delete_timer@drivers/scsi/scsi_error.c").return { 86 | 87 | if(global_scmd == $scmd) 88 | { 89 | result = restore_val($scmd, entrynext, startsite, timeoutfunction, restore_state) 90 | printf(" scsi_delete_timer: return= %d, scmd= %d global_scmd= %d result= %d \n", $return, $scmd, global_scmd, result) 91 | target_access_t = 0 92 | global_scmd = 0 93 | restore_state = 0 94 | } 95 | } 96 | 97 | probe module("*").statement("scsi_add_timer@drivers/scsi/scsi_error.c") 98 | { 99 | if((target_access_t != 0) && (global_scmd == $scmd)) 100 | { 101 | timeoutfunction = $complete 102 | $timeout = timeout_period 103 | printf("\n scsi_add_timer: inode= %d, scmd= %d, timeout= %d, delete_timer_block= %d, timeoutfinction= %d \n", inode, $scmd, $timeout, delete_timer_block, timeoutfunction) 104 | } 105 | } 106 | -------------------------------------------------------------------------------- /fault_injection_common_sata_raid56/scsi_timeout_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | %} 35 | 36 | global global_scmd 37 | global startsite 38 | global entrynext 39 | global timeoutfunction 40 | global restore_state 41 | global target_access_t 42 | global entire_retries 43 | global timeout_period 44 | 45 | function restore_val:long (cmd:long, entrynext:long, startsite:long, func:long, hoststs:long) 46 | %{ 47 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 48 | 49 | scmd->eh_timeout.data = (unsigned long)(long)THIS->cmd; 50 | scmd->eh_timeout.start_site = (void *)(long)THIS->startsite; 51 | scmd->eh_timeout.function = (void (*)(unsigned long))(long)THIS->func; 52 | scmd->eh_timeout.entry.next = (void *)(long)THIS->entrynext; 53 | scmd->device->host->shost_state = (int)(long)THIS->hoststs; 54 | scmd->result = 0; 55 | 56 | THIS->__retvalue = (unsigned int)scmd->result; 57 | %} 58 | 59 | function save_start_site:long (cmd:long) 60 | %{ 61 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 62 | THIS->__retvalue = (unsigned int)scmd->eh_timeout.start_site; 63 | %} 64 | 65 | function save_entry_next:long (cmd:long) 66 | %{ 67 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 68 | unsigned int tempval; 69 | tempval = (unsigned int)scmd->eh_timeout.entry.next; 70 | scmd->eh_timeout.entry.next = NULL; 71 | THIS->__retvalue = tempval; 72 | %} 73 | 74 | 75 | probe module("*").function("scsi_delete_timer@drivers/scsi/scsi_error.c") { 76 | 77 | if(global_scmd == $scmd) 78 | { 79 | startsite = save_start_site($scmd) 80 | entrynext = save_entry_next($scmd) 81 | printf(" scsi_delete_timer: scmd= %d, startsite= %d, next= %d \n", $scmd, startsit, entrynext) 82 | } 83 | } 84 | 85 | probe module("*").function("scsi_delete_timer@drivers/scsi/scsi_error.c").return { 86 | 87 | if(global_scmd == $scmd) 88 | { 89 | result = restore_val($scmd, entrynext, startsite, timeoutfunction, restore_state) 90 | printf(" scsi_delete_timer: return= %d, scmd= %d global_scmd= %d result= %d \n", $return, $scmd, global_scmd, result) 91 | target_access_t = 0 92 | global_scmd = 0 93 | restore_state = 0 94 | } 95 | } 96 | 97 | probe module("*").statement("scsi_add_timer@drivers/scsi/scsi_error.c") 98 | { 99 | if((target_access_t != 0) && (global_scmd == $scmd)) 100 | { 101 | timeoutfunction = $complete 102 | $timeout = timeout_period 103 | printf("\n scsi_add_timer: inode= %d, scmd= %d, timeout= %d, delete_timer_block= %d, timeoutfinction= %d \n", inode, $scmd, $timeout, delete_timer_block, timeoutfunction) 104 | } 105 | } 106 | -------------------------------------------------------------------------------- /fault_injection_common_scsi_raid56/scsi_timeout_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | %} 35 | 36 | global global_scmd 37 | global startsite 38 | global entrynext 39 | global timeoutfunction 40 | global restore_state 41 | global target_access_t 42 | global entire_retries 43 | global timeout_period 44 | 45 | function restore_val:long (cmd:long, entrynext:long, startsite:long, func:long, hoststs:long) 46 | %{ 47 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 48 | 49 | scmd->eh_timeout.data = (unsigned long)(long)THIS->cmd; 50 | scmd->eh_timeout.start_site = (void *)(long)THIS->startsite; 51 | scmd->eh_timeout.function = (void (*)(unsigned long))(long)THIS->func; 52 | scmd->eh_timeout.entry.next = (void *)(long)THIS->entrynext; 53 | scmd->device->host->shost_state = (int)(long)THIS->hoststs; 54 | scmd->result = 0; 55 | 56 | THIS->__retvalue = (unsigned int)scmd->result; 57 | %} 58 | 59 | function save_start_site:long (cmd:long) 60 | %{ 61 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 62 | THIS->__retvalue = (unsigned int)scmd->eh_timeout.start_site; 63 | %} 64 | 65 | function save_entry_next:long (cmd:long) 66 | %{ 67 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 68 | unsigned int tempval; 69 | tempval = (unsigned int)scmd->eh_timeout.entry.next; 70 | scmd->eh_timeout.entry.next = NULL; 71 | THIS->__retvalue = tempval; 72 | %} 73 | 74 | 75 | probe module("*").function("scsi_delete_timer@drivers/scsi/scsi_error.c") { 76 | 77 | if(global_scmd == $scmd) 78 | { 79 | startsite = save_start_site($scmd) 80 | entrynext = save_entry_next($scmd) 81 | printf(" scsi_delete_timer: scmd= %d, startsite= %d, next= %d \n", $scmd, startsit, entrynext) 82 | } 83 | } 84 | 85 | probe module("*").function("scsi_delete_timer@drivers/scsi/scsi_error.c").return { 86 | 87 | if(global_scmd == $scmd) 88 | { 89 | result = restore_val($scmd, entrynext, startsite, timeoutfunction, restore_state) 90 | printf(" scsi_delete_timer: return= %d, scmd= %d global_scmd= %d result= %d \n", $return, $scmd, global_scmd, result) 91 | target_access_t = 0 92 | global_scmd = 0 93 | restore_state = 0 94 | } 95 | } 96 | 97 | probe module("*").statement("scsi_add_timer@drivers/scsi/scsi_error.c") 98 | { 99 | if((target_access_t != 0) && (global_scmd == $scmd)) 100 | { 101 | timeoutfunction = $complete 102 | $timeout = timeout_period 103 | printf("\n scsi_add_timer: inode= %d, scmd= %d, timeout= %d, delete_timer_block= %d, timeoutfinction= %d \n", inode, $scmd, $timeout, delete_timer_block, timeoutfunction) 104 | } 105 | } 106 | -------------------------------------------------------------------------------- /sample_scripts/md_werr_test_sample.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # 4 | GREPRESULT=1 5 | TESTDIR=/home/test 6 | TOOLDIR=".." 7 | TARGETINODE=0 8 | DEVTYPE=$1 9 | RAIDTYPE=$2 10 | RAIDENV=$3 11 | SCRIPTNAME=$4 12 | RAIDDEV=/dev/md0 13 | TESTFILENAME='targetfile_'$(date --rfc-3339=date) 14 | TARGETDEVS="8 16 81" 15 | 16 | if [ $RAIDENV = "recov" ]; then 17 | TARGETDEVS="8 33 81" 18 | fi 19 | 20 | 21 | if [ -z "$DEVTYPE" -o -z "$RAIDTYPE" -o -z "$SCRIPTNAME" ]; then 22 | echo "usage: sh $0 device(scsi or sata) raid(1 or 10) env scriptname" 23 | echo " e.g. sh $0 scsi 1 deg disk_rerr.stp" 24 | exit 25 | fi 26 | 27 | if [ $RAIDTYPE -ne 1 -a $RAIDTYPE -ne 10 ]; then 28 | echo "usage: sh $0 device(scsi or sata) raid(1 or 10)" 29 | echo " e.g. sh $0 scsi 1" 30 | exit 31 | fi 32 | 33 | 34 | 35 | # cleanup 36 | umount $TESTDIR 37 | mdadm -S $RAIDDEV 38 | mkdir /home/test 39 | 40 | # create an array to be tested 41 | echo "#############################" 42 | echo "# " 43 | echo "# Creating an array" 44 | echo "# " 45 | echo "#############################" 46 | sh ./md_create_array.sh $RAIDTYPE $RAIDENV 47 | 48 | # mount md array 49 | mount -t ext3 $RAIDDEV $TESTDIR 50 | sleep 5 51 | 52 | # create a target file for a fault injection and get its inode num 53 | dd if=/dev/urandom of=$TESTDIR/targetfile bs=1000 count=1000 54 | cp $TESTDIR/targetfile /tmp/$TESTFILENAME 55 | echo "test write" >> /tmp/$TESTFILENAME 56 | 57 | echo "test file to inject a fault created" 58 | 59 | TARGETINODE=$(ls -ail $TESTDIR | grep targetfile | awk '{print $1}') 60 | echo "TARGETINODE= $TARGETINODE" 61 | 62 | # 63 | # main routine 64 | # 65 | sync 66 | ls $TESTDIR -hil 67 | 68 | # purge cache 69 | echo 1 > /proc/sys/vm/drop_caches; 70 | sleep 2 71 | cat $TESTDIR/targetfile > /dev/null 72 | 73 | # Run the script 74 | echo "#############################" 75 | echo "# " 76 | echo "# Runing $SCRIPTNAME " 77 | echo "# " 78 | echo "#############################" 79 | stap $TOOLDIR/$SCRIPTNAME $TARGETDEVS 1 $TARGETINODE -g -I $TOOLDIR/fault_injection_common_$DEVTYPE/ -v | tee stapresult.txt & 80 | 81 | echo "waiting stap startup" 82 | sleep 1 83 | GREPRESULT=1 84 | while [ $GREPRESULT -eq 1 ] 85 | do 86 | grep "BEGIN" stapresult.txt > /dev/null 87 | GREPRESULT=$? 88 | done 89 | 90 | if [ $RAIDENV = "recov" ]; then 91 | # recovering /dev/sdb1 92 | mdadm /dev/md0 -r /dev/sdb1 93 | sleep 1 94 | mdadm /dev/md0 -a /dev/sdb1 95 | sleep 1 96 | cat /proc/mdstat 97 | fi 98 | 99 | echo "#############################" 100 | echo "# " 101 | echo "# Injecting a fault " 102 | echo "# " 103 | echo "#############################" 104 | date 105 | sleep 1 106 | # write to the test file 107 | echo "test write" >> $TESTDIR/targetfile 108 | sleep 1 109 | # sync command to start actual write 110 | sync 111 | echo "Fault injected" 112 | 113 | # stop the script 114 | pkill stap 115 | sleep 10 116 | 117 | # print the result, see also the script printout 118 | echo "#############################" 119 | echo "# " 120 | echo "# After the fault injection. Show the results " 121 | echo "# " 122 | echo "#############################" 123 | sleep 5 124 | 125 | # compare the original file with the read file 126 | echo "verify the contents when a fault injected" 127 | sync 128 | echo 1 > /proc/sys/vm/drop_caches; 129 | cmp /tmp/$TESTFILENAME $TESTDIR/targetfile 130 | echo "cmp result = $?" 131 | tail -n 50 /var/log/messages 132 | cat /proc/mdstat 133 | 134 | # wait for recovery completion if needed 135 | cat /proc/mdstat | grep "recovery" > /dev/null 136 | GREPRESULT=$? 137 | if [ $GREPRESULT -eq 0 ]; then 138 | 139 | while [ $GREPRESULT -eq 0 ] 140 | do 141 | cat /proc/mdstat | grep "recovery" > /dev/null 142 | GREPRESULT=$? 143 | done 144 | echo "recovery done" 145 | cat /proc/mdstat 146 | fi 147 | 148 | # cleanup 149 | umount $RAIDDEV 150 | umount /media/disk 151 | mdadm -S $RAIDDEV 152 | rm -rf /home/test 153 | 154 | -------------------------------------------------------------------------------- /doc/sample_usage.txt: -------------------------------------------------------------------------------- 1 | 2 | Introduction 3 | ------------ 4 | This document describes sample usage of the SCSI fault injection test tool. 5 | As a example, how to check the error handling routine of md-raid1 in case of 6 | correctable read error, is shown here. 7 | 8 | 9 | Hardware Requirement 10 | -------------------- 11 | x86 server running Fedora 8, 12 | SATA disk * 3, 13 | sda : system disk 14 | 15 | sdb : work disk1 16 | sdb1: 2GB partition 17 | 18 | sdc : work disk2 19 | sdc1: 2GB partition 20 | 21 | 22 | Test md-raid1 array by injecting faults 23 | --------------------------------------- 24 | 25 | 1. Set up for using SystemTap on upstream kernel 26 | 27 | 1-1 Install Fedora8(x86) on /dev/sda with systemtap. 28 | 29 | 1-2 Compile the upstream kernel with debuginfo. 30 | (See Appendix A. of readme.txt) 31 | 32 | 1-3 Decompress tool set and move them to an arbitrary directory. 33 | # cd /root 34 | # tar -jxf scsi_fault_injection_test_tool-1.0.0.tar.bz2 35 | 36 | 37 | 2. Create md raid1 array 38 | 39 | 2-1 Create an array 40 | # mdadm -C /dev/md0 -l1 n2 -f /dev/sd[bc]1 41 | # cat /proc/mdstat 42 | Personalities : [raid1] 43 | md0 : active raid1 sdb1[0] sdc1[1] 44 | 1959808 blocks [2/2] [UU] 45 | 46 | 47 | 2-2 Create a filesystem 48 | # mkfs -t ext3 /dev/md0 49 | 50 | 2-3 Mount the md device 51 | # mount -t ext3 /dev/md0 /home/test 52 | 53 | 3. Create a target file to cause a fault 54 | 55 | 3-1 Create a file 56 | # cd /home/test 57 | # touch test.txt 58 | # echo "test file" >> test.txt 59 | # ls -hil 60 | 61 | -> We will use the test.txt to inject a fault. 62 | 63 | 3-2 Check the inode number of the file 64 | 65 | # ls -hil 66 | 67 | total 28K 68 | 11 drwx------ 2 root root 16K 2007-12-21 18:22 lost+found 69 | 12 -rw-r--r-- 1 root root 21 2007-12-21 19:39 test.txt 70 | 71 | -> test.text has inode 12 on the md device 72 | 73 | 4. Drop page caches 74 | 75 | At this moment, test.txt is probably on the page cache. The read 76 | request only to read from the cache. So, page caches needs to be 77 | dumped to generate a read access from a disk. 78 | 79 | # echo 1 > /proc/sys/vm/drop_caches 80 | 81 | 5. Run a script 82 | 83 | # stap /root/scsi_fault_injection_test_tool-1.0.0/sec_rerr.stp 8 17 33 1 12 -g \ 84 | -I /root/scsi_fault_injection_test_tool-1.0.0/fault_injection_common_sata -v 85 | 86 | Run sec_rerr.stp to set a hook for a correctable read error. 87 | Currently, /dev/sdb and /dev/sdc consists of a md RAID1 array, 88 | so give minor-min=17, minor-max=33. 89 | We want to inject a fault by accessing test.txt, which is inode=12, so 90 | flag=1(inode) value=12. 91 | 92 | 6. Wait a SystemTap to be ready 93 | 94 | SystemTap needs takes about 30 seconds to be ready. 95 | When it get ready, "Pass 5: starting run" is shown on the screen. 96 | 97 | 7. Access the target device 98 | 99 | Open another console and read test.txt to inject a fault. 100 | 101 | # cat test.txt 102 | test file 103 | 104 | The cat command seems completed as usual, but actually 105 | error handling routine works on the background. 106 | 107 | 108 | 8. Check the result 109 | 110 | The systemtap script prints the following messages on the console. 111 | 112 | ... 113 | 114 | SCSI_DISPATCH_CMD: command= 40 115 | SCSI_DISPATCH_CMD: major= 8 minor= 16 116 | SCSI_DISPATCH_CMD: flag(0:LBA, 1:inode)= 1 117 | SCSI_DISPATCH_CMD: start sector= 2504855 118 | SCSI_DISPATCH_CMD: req bufflen= 4096 119 | SCSI_DISPATCH_CMD: inode= 12 120 | SCSI_DISPATCH_CMD: scmd = 4122759912 121 | SCSI_DISPATCH_CMD: [7]=0 [8]=8 122 | SCSI_DISPATCH_CMD: cmd-retries = 0 entire-retry =0 123 | 124 | SCSI_DISPATCH_CMD: cmd= 4122759912, allowed = 5 retries= 0 125 | SCSI_DISPATCH_CMD:scsi_cmnd= 4122759912 (host,channel,id,lun)= (0, 0, 1, 0) 126 | SCSI_DISPATCH_CMD:execname=cat, pexecname=bash 127 | scsi_decide_disposition : major=8 minor=16 scmd=4122759912 128 | scsi_next_command : cmd = 4122759912 129 | ... 130 | 131 | SCSI_DISPATCH_CMD: command= 42 132 | SCSI_DISPATCH_CMD: major= 8 minor= 16 133 | SCSI_DISPATCH_CMD: flag(0:LBA, 1:inode)= 1 134 | SCSI_DISPATCH_CMD: start sector= 2504855 135 | SCSI_DISPATCH_CMD: req bufflen= 4096 136 | ... 137 | 138 | In the middle of the message, we can find the row begin with 139 | "scsi_decide_disposition". This means that fault is injected on 140 | the read access(command= 40) to sdb(major= 8 minor= 16) start 141 | with LBA=2504855, concerning test.txt(inode= 12). 142 | 143 | Also, we can find that someone write(command= 42) to the same location 144 | (major= 8 minor= 16, start sector= 2504855). Actually, it is a write 145 | access made by md-raid1 error handling routine. 146 | 147 | The followings are logged in the syslog. 148 | 149 | ... 150 | 151 | kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK 152 | kernel: sd 0:0:1:0: [sdb] Sense Key : Medium Error [current] 153 | kernel: sd 0:0:1:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed 154 | kernel: end_request: I/O error, dev sdb, sector 50825299 155 | kernel: raid1:md0: read error corrected (8 sectors at 9650704 on sdb3) 156 | kernel: raid1: sdc: redirecting sector 9650704 to another mirror 157 | ... 158 | 159 | We can find that a read error occurred on sdb, but corrected. 160 | The md-raid1 doesn't detach anything because error on sdb is corrected. 161 | 162 | # cat /proc/mdstat 163 | Personalities : [raid1] 164 | md0 : active raid1 sdb1[0] sdc1[1] 165 | 1959808 blocks [2/2] [UU] 166 | 167 | 168 | 9. Stop the running script 169 | 170 | The script started in step 5. is continue to running. 171 | You have to stop it by hand. 172 | 173 | # pkill stap 174 | (or hit Ctrl-C) 175 | 176 | 177 | -------------------------------------------------------------------------------- /sample_scripts/README.txt: -------------------------------------------------------------------------------- 1 | 2 | These sample shell scripts are wrapper of scsi fault injection 3 | test tool. By using these scripts, error handling routine of md RAID1 4 | and md RAID10 can be easily tested. User can see what is happening 5 | when various kinds of SCSI fault occurs on md RAID. 6 | 7 | 8 | Requirement 9 | ----------- 10 | - Fedora8 installed only on sda 11 | - kernel 2.6.22 ~ 2.6.23.14 12 | - systemtap working environment 13 | - scsi_fault_injection_test_tool ver 1.0.1 or later 14 | - At least 3 sata/scsi disks for working md raid array. 15 | These should be seen as sdb, sdc, sdd, sde, sdf, ... 16 | The first partition (such as sdb1, sdc1, ...) need to be allocated 17 | with system ID 0xfd (linux raid autodetect). These partitions should 18 | be the same size for raid array creation. 19 | e.g. By using fdisk command, allocate 2GB of sdb1, sdc1 and sdd1 20 | with system ID 0xfd. 21 | - root privilege to run the script. 22 | 23 | 24 | Usage 25 | ----- 26 | Move to "sample_scripts" directory and run md_scsi_fault_injection_test.sh or 27 | md_scsi_fault_injection_test_timeout.sh as follows: 28 | 29 | #sh md_scsi_fault_injection_test.sh [scsi|sata] [1|10] 30 | #sh md_scsi_fault_injection_test_timeout.sh [scsi|sata] [1|10] 31 | 32 | 33 | Description 34 | ----------- 35 | These scripts automatically inject the following scsi faults 36 | one by one on dynamically created md RAID array with the 37 | following conditions. 38 | 39 | Faults: 40 | md_scsi_fault_injection_test.sh 41 | disk_rerr.stp (permanent read error simulation) 42 | disk_rwerr.stp (permanent read /write error simulation) 43 | sector_rerr.stp (read error correctable by write simulation) 44 | temporary_rerr.stp (temporary read error simulation) 45 | temporary_werr.stp (temporary write error simulation) 46 | 47 | md_scsi_fault_injection_test_timeout.sh 48 | r_timeout.stp (temporary no response on read access simulation) 49 | w_timeout.stp (temporary no response on write access simulation) 50 | 51 | for more details, see readme.txt 52 | 53 | Conditions: 54 | normal array (fully working array) 55 | degraded array (array with partially disabled disk) 56 | redundant array (normal + spare disk) 57 | recovering array (array during recovery) 58 | 59 | Logs are automatically collected to "results_`date`" directory and named 60 | "disktype"-"raidlevel"-"condition"-"scriptname"_result respectively. 61 | Log consists of systemtap log, /proc/mdstat, a part of syslog and all 62 | command activity in the scripts. 63 | Because log includes all output from systemtap and commands in the scripts, 64 | user need to extract important part from the log to see what happens on each case. 65 | 66 | The following sections in the log is important to follow the error 67 | handling of the raid array. At the beginning of each section, the section name 68 | is surrounded by "########" in the log. 69 | 70 | 1. Before fault injection 71 | In "Creating an array" section, array status taken from /proc/mdstat is 72 | logged before injecting a fault. 73 | 74 | 2. During fault injection 75 | In "Injecting a fault " section, the outputs of scsi fault injection test tool 76 | is logged. This section includes target location which would trigger the 77 | fault when accessed. 78 | 79 | 3. After fault injection 80 | In "After the fault injection. Show the results" section, a part of syslog 81 | and array status after the fault injection is recorded. In the syslog, 82 | some error handling activity done by md and scsi layer are recorded, 83 | which includes fault detection,device detach, recovery sync, etc. 84 | The array status is taken from /proc/mdstat as before. User can see 85 | the array status transition by comparing before and after the array status. 86 | 87 | This script verifies the result of access when the fault is injected by using 88 | cmp command. If an I/O error or data corruption occurs as a result of fault 89 | injection, the cmp command result would be nonzero. The result is also 90 | included in this section. 91 | 92 | We will show the details by using a sample log. 93 | 94 | 95 | Example 96 | ------- 97 | #sh md_scsi_fault_injection_test.sh scsi 10 98 | 99 | Inject disk_rerr.stp, disk_rwerr.stp, sector_rerr.stp, temporary_rerr.stp and 100 | temporary_werr.stp to normal, degraded, redundant and recovering conditions of 101 | md RAID1 array consists of scsi disks. 102 | 103 | #sh md_scsi_fault_injection_test_timeout.sh sata 1 104 | 105 | Inject r_timeout.stp and w_timeout.stp to normal, degraded, redundant and 106 | recovering conditions of md RAID10 array consists of sata disks. 107 | 108 | For log analysis explanation, we use "scsi-RAID10-red-disk_rerr.stp.result" 109 | taken on our environment as a sample log. In the following explanation, 110 | extracted actual messages recorded in the log are enclosed by "-----------". 111 | 112 | 113 | 1. In "Creating an array" section of the log, tested array status is logged. 114 | This case, tested array is a fully working RAID10 with a spare disk. 115 | User can find the following messages. 116 | ---------------------------------------------------------------- 117 | md0 : active raid10 sdf1[4](S) sde1[3] sdd1[2] sdc1[1] sdb1[0] 118 | 3919616 blocks 64K chunks 2 near-copies [4/4] [UUUU] 119 | ---------------------------------------------------------------- 120 | 121 | 2. In "Injecting a fault " section of the log, the following messages appear. 122 | We can see the row begins with "scsi_decide_disposition" in it. 123 | This means a fault is injected on a read access(command= 40) to 124 | sdb(major= 8 minor= 16). 125 | ---------------------------------------------------------------- 126 | SCSI_DISPATCH_CMD: command= 40 127 | SCSI_DISPATCH_CMD: major= 8 minor= 16 128 | SCSI_DISPATCH_CMD: flag(0:LBA, 1:inode)= 1 129 | SCSI_DISPATCH_CMD: start sector= 81983 130 | SCSI_DISPATCH_CMD: req bufflen= 16384 131 | SCSI_DISPATCH_CMD: inode= 12 132 | SCSI_DISPATCH_CMD: scmd = 4147687424 133 | SCSI_DISPATCH_CMD: [7]=0 [8]=32 134 | SCSI_DISPATCH_CMD: cmd-retries = 0 entire-retry =0 135 | 136 | SCSI_DISPATCH_CMD: cmd= 4147687424, allowed = 5 retries= 0 137 | SCSI_DISPATCH_CMD:scsi_cmnd= 4147687424 (host,channel,id,lun)= (0, 0, 1, 0) 138 | SCSI_DISPATCH_CMD:execname=cat, pexecname=sh 139 | scsi_decide_disposition : major=8 minor=16 scmd=4147687424 140 | scsi_next_command : cmd = 4147687424 141 | ---------------------------------------------------------------- 142 | 143 | 3. In "After the fault injection. Show the results " section of the log, 144 | syslog and array status after the fault injection is logged. 145 | 146 | The following messages appear in the log several times. 147 | This means that the scsi mid layer found error on sdb several times. 148 | ---------------------------------------------------------------- 149 | kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK 150 | kernel: sd 0:0:1:0: [sdb] Sense Key : Medium Error [current] 151 | kernel: sd 0:0:1:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed 152 | kernel: end_request: I/O error, dev sdb, sector 81983 153 | ---------------------------------------------------------------- 154 | 155 | So md RAID10 decide to detach sdb from the array. 156 | ---------------------------------------------------------------- 157 | kernel: raid10: Disk failure on sdb1, disabling device. 158 | kernel: #011Operation continuing on 3 devices 159 | ---------------------------------------------------------------- 160 | 161 | This case, md RAID10 array has a spare disk, so recovery begins 162 | automatically after sdb is disabled. 163 | ---------------------------------------------------------------- 164 | kernel: md: recovery of RAID array md0 165 | ---------------------------------------------------------------- 166 | 167 | After the fault injection, the array status shows that the recovery is 168 | finished and the array is fully working but sdb is detached. 169 | ---------------------------------------------------------------- 170 | Personalities : [raid10] 171 | md0 : active raid10 sdf1[0] sde1[3] sdd1[2] sdc1[1] sdb1[4](F) 172 | 3919616 blocks 64K chunks 2 near-copies [4/4] [UUUU] 173 | ---------------------------------------------------------------- 174 | 175 | This case, the access completed successfully because cmp result is 0. 176 | ---------------------------------------------------------------- 177 | cmp result = 0 178 | ---------------------------------------------------------------- 179 | 180 | 181 | Limitations 182 | ------------ 183 | - md_scsi_fault_injection_test_timeout.sh may fail because 184 | root partition can be readonly as a result of timeout injection. 185 | 186 | - Some md bugs recently reported may be reproduce by running 187 | the scripts, which cause array deadlock. 188 | 189 | - It takes a long time to finish the scripts because 5 * 4 or 2 * 4 fault 190 | patterns are tested automatically. For example, about 3 minutes required to 191 | test a single pattern for 2GB md RAID1 array, so estimated wait time would 192 | be 5 * 4 * 3 = 60 minutes for md_scsi_fault_injection_test_timeout.sh 193 | and 2 * 4 * 3 = 24 minutes for md_scsi_fault_injection_test_timeout.sh 194 | 195 | -------------------------------------------------------------------------------- /fault_injection_common_sata/scsi_fault_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | %} 35 | 36 | global target_access 37 | global target_scmd 38 | global target_rq 39 | global target_bio 40 | global target_r1bio 41 | global target_block 42 | global fix_write 43 | global temp_failure 44 | 45 | global error_type 46 | global access_type 47 | global dev_major 48 | global dev_minor_min 49 | global dev_minor_max 50 | global inode_lba_flag 51 | global inode_lba_val 52 | global timeout_flag 53 | global retry_allowed 54 | global target_minor 55 | 56 | 57 | probe begin 58 | { 59 | target_block = -1 60 | target_minor = -1 61 | printf("\nBEGIN\n") 62 | } 63 | 64 | 65 | function set_sense_buf:long (cmd:long, result:long, sensekey:long, asc:long, ascq:long ) 66 | %{ 67 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 68 | 69 | scmd->result = (int)(long)THIS->result; 70 | scmd->sense_buffer[0] = 0x70; /* current, fixed format */ 71 | scmd->sense_buffer[2] = (unsigned char)(long)THIS->sensekey; 72 | scmd->sense_buffer[7] = 0x13; /* length */ 73 | scmd->sense_buffer[12] = (unsigned char)(long)THIS->asc; 74 | scmd->sense_buffer[13] = (unsigned char)(long)THIS->ascq; 75 | %} 76 | 77 | function get_inode:long (page:long) 78 | %{ 79 | struct page * thispage = (struct page *)(long)THIS->page; 80 | unsigned long tempval = (unsigned long)thispage->flags; 81 | struct address_space *mapping = thispage->mapping; 82 | 83 | if (unlikely(PageSwapCache(thispage))) 84 | mapping = NULL; 85 | #ifdef CONFIG_SLUB 86 | else if (unlikely(PageSlab(thispage))) 87 | mapping = NULL; 88 | #endif 89 | else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) 90 | mapping = NULL; 91 | 92 | if((mapping != NULL) && (mapping->host != NULL)) 93 | { 94 | THIS->__retvalue = (unsigned long)(mapping->host->i_ino); 95 | } else { 96 | THIS->__retvalue = 0; 97 | } 98 | %} 99 | 100 | 101 | probe module("*").function("scsi_decide_disposition@drivers/scsi/scsi_error.c") 102 | { 103 | scmd_direction = $scmd->sc_data_direction 104 | 105 | if((((temp_failure == 1) || (error_type == 1)) && (target_scmd == $scmd)) && ((scmd_direction == access_type) || (access_type == 3) || ((scmd_direction == 2) && (access_type== 4))) && ($scmd->request->rq_disk != 0)) 106 | { 107 | major = $scmd->request->rq_disk->major 108 | minor = $scmd->request->rq_disk->first_minor 109 | block = $scmd->request->__sector 110 | req_len = $scmd->sdb->length 111 | 112 | if(major == dev_major && minor == target_minor && ((block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9))))) 113 | { 114 | if((scmd_direction == 2) && (fix_write == 2)) 115 | { 116 | #fix_write = 0 117 | } else 118 | { 119 | printf("scsi_decide_disposition : major=%d minor=%d scmd=%d \n",major, minor, $scmd) 120 | /* create fake status and sense data */ 121 | temp_failure++ 122 | set_sense_buf($scmd, 0x02, 0x03, 0x11, 0x04) 123 | } 124 | } 125 | } 126 | } 127 | 128 | probe module("*").function("scsi_next_command@drivers/scsi/scsi_lib.c") 129 | { 130 | if((target_access != 0) && (target_scmd == $cmd)) 131 | { 132 | printf("scsi_next_command : cmd = %d \n", $cmd) 133 | target_access = 0 134 | target_scmd = 0 135 | target_rq = 0 136 | restore_state = 0 137 | } 138 | } 139 | 140 | probe module("*").function("scsi_dispatch_cmd@drivers/scsi/scsi.c") 141 | { 142 | struct_bio= $cmd->request->bio 143 | block = $cmd->request->__sector 144 | req_len = $cmd->sdb->length 145 | 146 | if($cmd->request->rq_disk != 0) 147 | { 148 | major = $cmd->request->rq_disk->major 149 | minor = $cmd->request->rq_disk->first_minor 150 | } 151 | 152 | if(target_block == -1) 153 | { 154 | if(struct_bio != 0) 155 | { 156 | page = $cmd->request->bio->bi_io_vec->bv_page 157 | if(page != 0) 158 | { 159 | inode = get_inode(page) 160 | } 161 | } 162 | } 163 | 164 | if(((inode_lba_flag ==1)&&(inode == inode_lba_val)) || ((inode_lba_flag ==0 ) && ((block <= inode_lba_val) && (inode_lba_val < block + (req_len >> 9)))) || (block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9)))) 165 | { 166 | printf("\nSCSI_DISPATCH_CMD: command= %d \n", $cmd->cmnd[0]) 167 | printf("SCSI_DISPATCH_CMD: major= %d minor= %d \n", major, minor) 168 | printf("SCSI_DISPATCH_CMD: flag(0:LBA, 1:inode)= %d \n", inode_lba_flag) 169 | printf("SCSI_DISPATCH_CMD: start sector= %d \n", $cmd->request->__sector) 170 | printf("SCSI_DISPATCH_CMD: req bufflen= %d \n", $cmd->sdb->length) 171 | printf("SCSI_DISPATCH_CMD: inode= %d \n", inode) 172 | printf("SCSI_DISPATCH_CMD: scmd = %d \n", $cmd) 173 | printf("SCSI_DISPATCH_CMD: [7]=%d [8]=%d \n", $cmd->cmnd[7],$cmd->cmnd[8]) 174 | 175 | if((target_minor== -1) && (major == dev_major) && ((dev_minor_min & 0xfff0) <= minor) && (minor <= (dev_minor_max & 0xfff0))) 176 | { 177 | tmp_minor = minor 178 | } 179 | 180 | if((major == dev_major && ((minor == tmp_minor) || (minor == target_minor))) && (fix_write != 2)) 181 | { 182 | /* inject errors on the designated device */ 183 | printf("SCSI_DISPATCH_CMD: cmd-retries = %d entire-retry =%d \n", $cmd->retries, entire_retries) 184 | cmd_direction = $cmd->sc_data_direction 185 | if((cmd_direction == 1) && (access_type == 4)) 186 | { 187 | fix_write = 2 188 | printf("SCSI_DISPATCH_CMD: fix_write =%d \n", fix_write) 189 | } 190 | 191 | if((temp_failure == 0) || (error_type == 1) ||((timeout_flag == 1) && (entire_retries < 1))) 192 | { 193 | if((cmd_direction == access_type) || ((cmd_direction == 2) && (access_type == 4)) || (access_type == 3)) 194 | { 195 | if(target_minor == -1) 196 | { 197 | target_minor = tmp_minor 198 | } 199 | 200 | if(target_block == -1) 201 | { 202 | target_block = block 203 | } 204 | 205 | if(target_access == 0) 206 | { 207 | retry_allowed = $cmd->allowed 208 | target_access++ 209 | target_scmd = $cmd 210 | target_rq = $cmd->request 211 | } 212 | 213 | temp_failure++ 214 | 215 | if(($cmd->cmnd[0] == 0x28) || ($cmd->cmnd[0] == 0x2a)) 216 | { 217 | /* read_10 or write_10 */ 218 | $cmd->cmnd[7]=0 219 | $cmd->cmnd[8]=0 220 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 221 | { 222 | /* read_16 or write_16 */ 223 | $cmd->cmnd[10]=0 224 | $cmd->cmnd[11]=0 225 | $cmd->cmnd[12]=0 226 | $cmd->cmnd[13]=0 227 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 228 | { 229 | /* read_6 or write_6 */ 230 | $cmd->cmnd[4]=0 231 | } 232 | 233 | if(target_scmd == $cmd) 234 | { 235 | entire_retries++ 236 | } 237 | 238 | if((target_access_t == 0) && (timeout_flag == 1)) 239 | { 240 | target_access_t++ 241 | global_scmd = $cmd 242 | restore_state = $cmd->device->host->shost_state 243 | $cmd->device->host->shost_state = 4 244 | } 245 | } 246 | } 247 | 248 | printf("\nSCSI_DISPATCH_CMD: cmd= %d, allowed = %d retries= %d \n", $cmd, $cmd->allowed, $cmd->retries) 249 | printf("SCSI_DISPATCH_CMD:scsi_cmnd= %d (host,channel,id,lun)= (%d, %d, %d, %d) \n", $cmd, $cmd->device->host->host_no, $cmd->device->channel, $cmd->device->id, $cmd->device->lun) 250 | printf("SCSI_DISPATCH_CMD:execname=%s, pexecname=%s\n", execname(), pexecname()) 251 | } 252 | } 253 | } 254 | 255 | -------------------------------------------------------------------------------- /fault_injection_common_scsi/scsi_fault_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | %} 35 | 36 | global target_access 37 | global target_scmd 38 | global target_rq 39 | global target_bio 40 | global target_r1bio 41 | global target_block 42 | global fix_write 43 | global temp_failure 44 | 45 | global error_type 46 | global access_type 47 | global dev_major 48 | global dev_minor_min 49 | global dev_minor_max 50 | global inode_lba_flag 51 | global inode_lba_val 52 | global timeout_flag 53 | global retry_allowed 54 | global target_minor 55 | 56 | 57 | probe begin 58 | { 59 | target_block = -1 60 | target_minor = -1 61 | printf("\nBEGIN\n") 62 | } 63 | 64 | 65 | function set_sense_buf:long (cmd:long, result:long, sensekey:long, asc:long, ascq:long ) 66 | %{ 67 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 68 | 69 | scmd->result = (int)(long)THIS->result; 70 | scmd->sense_buffer[0] = 0x70; /* current, fixed format */ 71 | scmd->sense_buffer[2] = (unsigned char)(long)THIS->sensekey; 72 | scmd->sense_buffer[7] = 0x13; /* length */ 73 | scmd->sense_buffer[12] = (unsigned char)(long)THIS->asc; 74 | scmd->sense_buffer[13] = (unsigned char)(long)THIS->ascq; 75 | %} 76 | 77 | function get_inode:long (page:long) 78 | %{ 79 | struct page * thispage = (struct page *)(long)THIS->page; 80 | unsigned long tempval = (unsigned long)thispage->flags; 81 | struct address_space *mapping = thispage->mapping; 82 | 83 | if (unlikely(PageSwapCache(thispage))) 84 | mapping = NULL; 85 | #ifdef CONFIG_SLUB 86 | else if (unlikely(PageSlab(thispage))) 87 | mapping = NULL; 88 | #endif 89 | else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) 90 | mapping = NULL; 91 | 92 | if((mapping != NULL) && (mapping->host != NULL)) 93 | { 94 | THIS->__retvalue = (unsigned long)(mapping->host->i_ino); 95 | } else { 96 | THIS->__retvalue = 0; 97 | } 98 | %} 99 | 100 | 101 | probe module("*").function("scsi_decide_disposition@drivers/scsi/scsi_error.c") 102 | { 103 | scmd_direction = $scmd->sc_data_direction 104 | 105 | if((((temp_failure == 1) || (error_type == 1)) && (target_scmd == $scmd)) && ((scmd_direction == access_type) || (access_type == 3) || ((scmd_direction == 2) && (access_type== 4))) && ($scmd->request->rq_disk != 0)) 106 | { 107 | major = $scmd->request->rq_disk->major 108 | minor = $scmd->request->rq_disk->first_minor 109 | block = $scmd->request->sector 110 | req_len = $scmd->request_bufflen 111 | 112 | if(major == dev_major && minor == target_minor && ((block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9))))) 113 | { 114 | if((scmd_direction == 2) && (fix_write == 2)) 115 | { 116 | #fix_write = 0 117 | } else 118 | { 119 | printf("scsi_decide_disposition : major=%d minor=%d scmd=%d \n",major, minor, $scmd) 120 | /* create fake status and sense data */ 121 | temp_failure++ 122 | set_sense_buf($scmd, 0x02, 0x03, 0x11, 0x04) 123 | } 124 | } 125 | } 126 | } 127 | 128 | probe module("*").function("scsi_next_command@drivers/scsi/scsi_lib.c") 129 | { 130 | if((target_access != 0) && (target_scmd == $cmd)) 131 | { 132 | printf("scsi_next_command : cmd = %d \n", $cmd) 133 | target_access = 0 134 | target_scmd = 0 135 | target_rq = 0 136 | restore_state = 0 137 | } 138 | } 139 | 140 | probe module("*").function("scsi_dispatch_cmd@drivers/scsi/scsi.c") 141 | { 142 | struct_bio= $cmd->request->bio 143 | block = $cmd->request->sector 144 | req_len = $cmd->request_bufflen 145 | 146 | if($cmd->request->rq_disk != 0) 147 | { 148 | major = $cmd->request->rq_disk->major 149 | minor = $cmd->request->rq_disk->first_minor 150 | } 151 | 152 | if(target_block == -1) 153 | { 154 | if(struct_bio != 0) 155 | { 156 | page = $cmd->request->bio->bi_io_vec->bv_page 157 | if(page != 0) 158 | { 159 | inode = get_inode(page) 160 | } 161 | } 162 | } 163 | 164 | if(((inode_lba_flag ==1)&&(inode == inode_lba_val)) || ((inode_lba_flag ==0 ) && ((block <= inode_lba_val) && (inode_lba_val < block + (req_len >> 9)))) || (block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9)))) 165 | { 166 | printf("\nSCSI_DISPATCH_CMD: command= %d \n", $cmd->cmnd[0]) 167 | printf("SCSI_DISPATCH_CMD: major= %d minor= %d \n", major, minor) 168 | printf("SCSI_DISPATCH_CMD: flag(0:LBA, 1:inode)= %d \n", inode_lba_flag) 169 | printf("SCSI_DISPATCH_CMD: start sector= %d \n", $cmd->request->sector) 170 | printf("SCSI_DISPATCH_CMD: req bufflen= %d \n", $cmd->request_bufflen) 171 | printf("SCSI_DISPATCH_CMD: inode= %d \n", inode) 172 | printf("SCSI_DISPATCH_CMD: scmd = %d \n", $cmd) 173 | printf("SCSI_DISPATCH_CMD: [7]=%d [8]=%d \n", $cmd->cmnd[7],$cmd->cmnd[8]) 174 | 175 | if((target_minor== -1) && (major == dev_major) && ((dev_minor_min & 0xfff0) <= minor) && (minor <= (dev_minor_max & 0xfff0))) 176 | { 177 | tmp_minor = minor 178 | } 179 | 180 | if((major == dev_major && ((minor == tmp_minor) || (minor == target_minor))) && (fix_write != 2)) 181 | { 182 | /* inject errors on the designated device */ 183 | printf("SCSI_DISPATCH_CMD: cmd-retries = %d entire-retry =%d \n", $cmd->retries, entire_retries) 184 | cmd_direction = $cmd->sc_data_direction 185 | if((cmd_direction == 1) && (access_type == 4)) 186 | { 187 | fix_write = 2 188 | printf("SCSI_DISPATCH_CMD: fix_write =%d \n", fix_write) 189 | } 190 | 191 | if((temp_failure == 0) || (error_type == 1) ||((timeout_flag == 1) && (entire_retries <= retry_allowed))) 192 | { 193 | if((cmd_direction == access_type) || ((cmd_direction == 2) && (access_type == 4)) || (access_type == 3)) 194 | { 195 | if(target_minor == -1) 196 | { 197 | target_minor = tmp_minor 198 | } 199 | 200 | if(target_block == -1) 201 | { 202 | target_block = block 203 | } 204 | 205 | if(target_access == 0) 206 | { 207 | retry_allowed = $cmd->allowed 208 | target_access++ 209 | target_scmd = $cmd 210 | target_rq = $cmd->request 211 | } 212 | 213 | temp_failure++ 214 | 215 | if(($cmd->cmnd[0] == 0x28) || ($cmd->cmnd[0] == 0x2a)) 216 | { 217 | /* read_10 or write_10 */ 218 | $cmd->cmnd[7]=0 219 | $cmd->cmnd[8]=0 220 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 221 | { 222 | /* read_16 or write_16 */ 223 | $cmd->cmnd[10]=0 224 | $cmd->cmnd[11]=0 225 | $cmd->cmnd[12]=0 226 | $cmd->cmnd[13]=0 227 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 228 | { 229 | /* read_6 or write_6 */ 230 | $cmd->cmnd[4]=0 231 | } 232 | 233 | if(target_scmd == $cmd) 234 | { 235 | entire_retries++ 236 | } 237 | 238 | if((target_access_t == 0) && (timeout_flag == 1)) 239 | { 240 | target_access_t++ 241 | global_scmd = $cmd 242 | restore_state = $cmd->device->host->shost_state 243 | $cmd->device->host->shost_state = 4 244 | } 245 | } 246 | } 247 | 248 | printf("\nSCSI_DISPATCH_CMD: cmd= %d, allowed = %d retries= %d \n", $cmd, $cmd->allowed, $cmd->retries) 249 | printf("SCSI_DISPATCH_CMD:scsi_cmnd= %d (host,channel,id,lun)= (%d, %d, %d, %d) \n", $cmd, $cmd->device->host->host_no, $cmd->device->channel, $cmd->device->id, $cmd->device->lun) 250 | printf("SCSI_DISPATCH_CMD:execname=%s, pexecname=%s\n", execname(), pexecname()) 251 | } 252 | } 253 | } 254 | 255 | -------------------------------------------------------------------------------- /fault_injection_common_sata_raid56/scsi_fault_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | #include 35 | %} 36 | 37 | global target_access 38 | global target_scmd 39 | global target_rq 40 | global target_bio 41 | global target_block 42 | global fix_write 43 | global temp_failure 44 | 45 | global error_type 46 | global access_type 47 | global dev_major 48 | global dev_minor_min 49 | global dev_minor_max 50 | global inode_lba_flag 51 | global inode_lba_val 52 | global timeout_flag 53 | global retry_allowed 54 | global target_minor 55 | 56 | global orig_bio 57 | global orig_sector 58 | global global_shsector 59 | 60 | 61 | probe begin 62 | { 63 | orig_sector = -1 64 | global_shsector = -1 65 | target_block = -1 66 | target_minor = -1 67 | printf("\nBEGIN\n") 68 | } 69 | 70 | 71 | function set_sense_buf:long (cmd:long, result:long, sensekey:long, asc:long, ascq:long ) 72 | %{ 73 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 74 | 75 | scmd->result = (int)(long)THIS->result; 76 | scmd->sense_buffer[0] = 0x70; /* current, fixed format */ 77 | scmd->sense_buffer[2] = (unsigned char)(long)THIS->sensekey; 78 | scmd->sense_buffer[7] = 0x13; /* length */ 79 | scmd->sense_buffer[12] = (unsigned char)(long)THIS->asc; 80 | scmd->sense_buffer[13] = (unsigned char)(long)THIS->ascq; 81 | %} 82 | 83 | function get_inode:long (page:long) 84 | %{ 85 | struct page * thispage = (struct page *)(long)THIS->page; 86 | unsigned long tempval = (unsigned long)thispage->flags; 87 | struct address_space *mapping = thispage->mapping; 88 | 89 | if (unlikely(PageSwapCache(thispage))) 90 | mapping = NULL; 91 | #ifdef CONFIG_SLUB 92 | else if (unlikely(PageSlab(thispage))) 93 | mapping = NULL; 94 | #endif 95 | else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) 96 | mapping = NULL; 97 | 98 | if((mapping != NULL) && (mapping->host != NULL)) 99 | { 100 | THIS->__retvalue = (unsigned long)(mapping->host->i_ino); 101 | } else { 102 | THIS->__retvalue = 0; 103 | } 104 | %} 105 | 106 | probe module("*").function("make_request@drivers/md/raid5.c") 107 | { 108 | if(target_block == -1) 109 | { 110 | page = $bi->bi_io_vec->bv_page 111 | if(page != 0) 112 | { 113 | inode = get_inode(page) 114 | } 115 | } 116 | 117 | if((target_block == -1)&&(inode_lba_flag ==1)&&(inode == inode_lba_val)) 118 | { 119 | orig_bio = $bi 120 | orig_sector = $bi->bi_sector 121 | } 122 | } 123 | 124 | probe module("*").function("add_stripe_bio@drivers/md/raid5.c") 125 | { 126 | if((target_block == -1)&&(orig_bio == $bi)&&(orig_sector == $bi->bi_sector)) 127 | { 128 | global_shsector = $sh->sector 129 | } 130 | } 131 | 132 | probe kernel.function("generic_make_request") 133 | { 134 | major = $bio->bi_bdev->bd_disk->major 135 | minor = $bio->bi_bdev->bd_disk->first_minor 136 | direction = ($bio->bi_rw) 137 | sector = $bio->bi_sector 138 | 139 | if((target_block == -1)&&(global_shsector == $bio->bi_sector)&&((access_type & 0x01)==(direction & 0x01)||access_type==3)) 140 | { 141 | if((target_minor== -1) && (major == dev_major) && ((dev_minor_min & 0xfff0) <= minor) && (minor <= (dev_minor_max & 0xfff0))) 142 | { 143 | target_minor= minor 144 | target_bio = $bio 145 | } 146 | } 147 | } 148 | 149 | probe module("*").function("scsi_decide_disposition@drivers/scsi/scsi_error.c") 150 | { 151 | scmd_direction = $scmd->sc_data_direction 152 | 153 | if((((temp_failure == 1) || (error_type == 1)) && (target_scmd == $scmd)) && ((scmd_direction == access_type) || (access_type == 3) || ((scmd_direction == 2) && (access_type== 4))) && ($scmd->request->rq_disk != 0)) 154 | { 155 | major = $scmd->request->rq_disk->major 156 | minor = $scmd->request->rq_disk->first_minor 157 | block = $scmd->request->sector 158 | req_len = $scmd->request_bufflen 159 | 160 | if(major == dev_major && minor == target_minor && ((block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9))))) 161 | { 162 | if((scmd_direction == 2) && (fix_write == 2)) 163 | { 164 | #fix_write = 0 165 | } else 166 | { 167 | printf("scsi_decide_disposition : major=%d minor=%d scmd=%d \n",major, minor, $scmd) 168 | /* create fake status and sense data */ 169 | temp_failure++ 170 | set_sense_buf($scmd, 0x02, 0x03, 0x11, 0x04) 171 | } 172 | } 173 | } 174 | } 175 | 176 | probe module("*").function("scsi_next_command@drivers/scsi/scsi_lib.c") 177 | { 178 | if((target_access != 0) && (target_scmd == $cmd)) 179 | { 180 | printf("scsi_next_command : cmd = %d \n", $cmd) 181 | target_access = 0 182 | target_scmd = 0 183 | target_rq = 0 184 | restore_state = 0 185 | } 186 | } 187 | 188 | probe module("*").function("scsi_dispatch_cmd@drivers/scsi/scsi.c") 189 | { 190 | struct_bio= $cmd->request->bio 191 | block = $cmd->request->sector 192 | req_len = $cmd->request_bufflen 193 | 194 | if($cmd->request->rq_disk != 0) 195 | { 196 | major = $cmd->request->rq_disk->major 197 | minor = $cmd->request->rq_disk->first_minor 198 | } 199 | 200 | if((target_block == -1) && (target_bio == struct_bio) && (target_minor == minor)) 201 | { 202 | target_block = block 203 | } 204 | 205 | if(target_block == -1) 206 | { 207 | if(struct_bio != 0) 208 | { 209 | page = $cmd->request->bio->bi_io_vec->bv_page 210 | if(page != 0) 211 | { 212 | inode = get_inode(page) 213 | } 214 | } 215 | } 216 | 217 | if(((inode_lba_flag ==1)&&(inode == inode_lba_val)) || ((inode_lba_flag ==0 ) && ((block <= inode_lba_val) && (inode_lba_val < block + (req_len >> 9)))) || (block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9)))) 218 | { 219 | printf("\nSCSI_DISPATCH_CMD: command= %d \n", $cmd->cmnd[0]) 220 | printf("SCSI_DISPATCH_CMD: major= %d minor= %d \n", major, minor) 221 | printf("SCSI_DISPATCH_CMD: flag(0:LBA, 1:inode)= %d \n", inode_lba_flag) 222 | printf("SCSI_DISPATCH_CMD: start sector= %d \n", $cmd->request->sector) 223 | printf("SCSI_DISPATCH_CMD: req bufflen= %d \n", $cmd->request_bufflen) 224 | printf("SCSI_DISPATCH_CMD: inode= %d \n", inode) 225 | printf("SCSI_DISPATCH_CMD: scmd = %d \n", $cmd) 226 | printf("SCSI_DISPATCH_CMD: [7]=%d [8]=%d \n", $cmd->cmnd[7],$cmd->cmnd[8]) 227 | 228 | if((target_minor== -1) && (major == dev_major) && ((dev_minor_min & 0xfff0) <= minor) && (minor <= (dev_minor_max & 0xfff0))) 229 | { 230 | tmp_minor = minor 231 | } 232 | 233 | if((major == dev_major && ((minor == tmp_minor) || (minor == target_minor))) && (fix_write != 2)) 234 | { 235 | /* inject errors on the designated device */ 236 | printf("SCSI_DISPATCH_CMD: cmd-retries = %d entire-retry =%d \n", $cmd->retries, entire_retries) 237 | cmd_direction = $cmd->sc_data_direction 238 | if((cmd_direction == 1) && (access_type == 4)) 239 | { 240 | fix_write = 2 241 | printf("SCSI_DISPATCH_CMD: fix_write =%d \n", fix_write) 242 | } 243 | 244 | if((temp_failure == 0) || (error_type == 1) ||((timeout_flag == 1) && (entire_retries < 1))) 245 | { 246 | if((cmd_direction == access_type) || ((cmd_direction == 2) && (access_type == 4)) || (access_type == 3)) 247 | { 248 | if(target_minor == -1) 249 | { 250 | target_minor = tmp_minor 251 | } 252 | 253 | if(target_block == -1) 254 | { 255 | target_block = block 256 | } 257 | 258 | if(target_access == 0) 259 | { 260 | target_bio = struct_bio 261 | retry_allowed = $cmd->allowed 262 | target_access++ 263 | target_scmd = $cmd 264 | target_rq = $cmd->request 265 | } 266 | 267 | temp_failure++ 268 | 269 | if(($cmd->cmnd[0] == 0x28) || ($cmd->cmnd[0] == 0x2a)) 270 | { 271 | /* read_10 or write_10 */ 272 | $cmd->cmnd[7]=0 273 | $cmd->cmnd[8]=0 274 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 275 | { 276 | /* read_16 or write_16 */ 277 | $cmd->cmnd[10]=0 278 | $cmd->cmnd[11]=0 279 | $cmd->cmnd[12]=0 280 | $cmd->cmnd[13]=0 281 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 282 | { 283 | /* read_6 or write_6 */ 284 | $cmd->cmnd[4]=0 285 | } 286 | 287 | if(target_scmd == $cmd) 288 | { 289 | entire_retries++ 290 | } 291 | 292 | if((target_access_t == 0) && (timeout_flag == 1)) 293 | { 294 | target_access_t++ 295 | global_scmd = $cmd 296 | restore_state = $cmd->device->host->shost_state 297 | $cmd->device->host->shost_state = 4 298 | } 299 | } 300 | } 301 | 302 | printf("\nSCSI_DISPATCH_CMD: cmd= %d, allowed = %d retries= %d \n", $cmd, $cmd->allowed, $cmd->retries) 303 | printf("SCSI_DISPATCH_CMD:scsi_cmnd= %d (host,channel,id,lun)= (%d, %d, %d, %d) \n", $cmd, $cmd->device->host->host_no, $cmd->device->channel, $cmd->device->id, $cmd->device->lun) 304 | printf("SCSI_DISPATCH_CMD:execname=%s, pexecname=%s\n", execname(), pexecname()) 305 | } 306 | } 307 | } 308 | 309 | -------------------------------------------------------------------------------- /fault_injection_common_scsi_raid56/scsi_fault_injection.stp: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env stap 2 | 3 | # SCSI fault injection library using SystemTap 4 | # Copyright (C) 2007 NEC Corporation 5 | # Copyright(c) Information-technology Promotion Agency, Japan. All rights reserved 2007. 6 | # Result of Open Source Software Development Activities of 7 | # Information-technology Promotion Agency, Japan. 8 | # 9 | # This program is free software; you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation; either version 2 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # This program is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with this program; if not, write to the Free Software 21 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 22 | # 23 | 24 | %{ 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | #include 34 | #include 35 | %} 36 | 37 | global target_access 38 | global target_scmd 39 | global target_rq 40 | global target_bio 41 | global target_block 42 | global fix_write 43 | global temp_failure 44 | 45 | global error_type 46 | global access_type 47 | global dev_major 48 | global dev_minor_min 49 | global dev_minor_max 50 | global inode_lba_flag 51 | global inode_lba_val 52 | global timeout_flag 53 | global retry_allowed 54 | global target_minor 55 | 56 | global orig_bio 57 | global orig_sector 58 | global global_shsector 59 | 60 | 61 | probe begin 62 | { 63 | orig_sector = -1 64 | global_shsector = -1 65 | target_block = -1 66 | target_minor = -1 67 | printf("\nBEGIN\n") 68 | } 69 | 70 | 71 | function set_sense_buf:long (cmd:long, result:long, sensekey:long, asc:long, ascq:long ) 72 | %{ 73 | struct scsi_cmnd * scmd = (struct scsi_cmnd *)(long)THIS->cmd; 74 | 75 | scmd->result = (int)(long)THIS->result; 76 | scmd->sense_buffer[0] = 0x70; /* current, fixed format */ 77 | scmd->sense_buffer[2] = (unsigned char)(long)THIS->sensekey; 78 | scmd->sense_buffer[7] = 0x13; /* length */ 79 | scmd->sense_buffer[12] = (unsigned char)(long)THIS->asc; 80 | scmd->sense_buffer[13] = (unsigned char)(long)THIS->ascq; 81 | %} 82 | 83 | function get_inode:long (page:long) 84 | %{ 85 | struct page * thispage = (struct page *)(long)THIS->page; 86 | unsigned long tempval = (unsigned long)thispage->flags; 87 | struct address_space *mapping = thispage->mapping; 88 | 89 | if (unlikely(PageSwapCache(thispage))) 90 | mapping = NULL; 91 | #ifdef CONFIG_SLUB 92 | else if (unlikely(PageSlab(thispage))) 93 | mapping = NULL; 94 | #endif 95 | else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) 96 | mapping = NULL; 97 | 98 | if((mapping != NULL) && (mapping->host != NULL)) 99 | { 100 | THIS->__retvalue = (unsigned long)(mapping->host->i_ino); 101 | } else { 102 | THIS->__retvalue = 0; 103 | } 104 | %} 105 | 106 | probe module("*").function("make_request@drivers/md/raid5.c") 107 | { 108 | if(target_block == -1) 109 | { 110 | page = $bi->bi_io_vec->bv_page 111 | if(page != 0) 112 | { 113 | inode = get_inode(page) 114 | } 115 | } 116 | 117 | if((target_block == -1)&&(inode_lba_flag ==1)&&(inode == inode_lba_val)) 118 | { 119 | orig_bio = $bi 120 | orig_sector = $bi->bi_sector 121 | } 122 | } 123 | 124 | probe module("*").function("add_stripe_bio@drivers/md/raid5.c") 125 | { 126 | if((target_block == -1)&&(orig_bio == $bi)&&(orig_sector == $bi->bi_sector)) 127 | { 128 | global_shsector = $sh->sector 129 | } 130 | } 131 | 132 | probe kernel.function("generic_make_request") 133 | { 134 | major = $bio->bi_bdev->bd_disk->major 135 | minor = $bio->bi_bdev->bd_disk->first_minor 136 | direction = ($bio->bi_rw) 137 | sector = $bio->bi_sector 138 | 139 | if((target_block == -1)&&(global_shsector == $bio->bi_sector)&&((access_type & 0x01)==(direction & 0x01)||access_type==3)) 140 | { 141 | if((target_minor== -1) && (major == dev_major) && ((dev_minor_min & 0xfff0) <= minor) && (minor <= (dev_minor_max & 0xfff0))) 142 | { 143 | target_minor= minor 144 | target_bio = $bio 145 | } 146 | } 147 | } 148 | 149 | probe module("*").function("scsi_decide_disposition@drivers/scsi/scsi_error.c") 150 | { 151 | scmd_direction = $scmd->sc_data_direction 152 | 153 | if((((temp_failure == 1) || (error_type == 1)) && (target_scmd == $scmd)) && ((scmd_direction == access_type) || (access_type == 3) || ((scmd_direction == 2) && (access_type== 4))) && ($scmd->request->rq_disk != 0)) 154 | { 155 | major = $scmd->request->rq_disk->major 156 | minor = $scmd->request->rq_disk->first_minor 157 | block = $scmd->request->sector 158 | req_len = $scmd->request_bufflen 159 | 160 | if(major == dev_major && minor == target_minor && ((block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9))))) 161 | { 162 | if((scmd_direction == 2) && (fix_write == 2)) 163 | { 164 | #fix_write = 0 165 | } else 166 | { 167 | printf("scsi_decide_disposition : major=%d minor=%d scmd=%d \n",major, minor, $scmd) 168 | /* create fake status and sense data */ 169 | temp_failure++ 170 | set_sense_buf($scmd, 0x02, 0x03, 0x11, 0x04) 171 | } 172 | } 173 | } 174 | } 175 | 176 | probe module("*").function("scsi_next_command@drivers/scsi/scsi_lib.c") 177 | { 178 | if((target_access != 0) && (target_scmd == $cmd)) 179 | { 180 | printf("scsi_next_command : cmd = %d \n", $cmd) 181 | target_access = 0 182 | target_scmd = 0 183 | target_rq = 0 184 | restore_state = 0 185 | } 186 | } 187 | 188 | probe module("*").function("scsi_dispatch_cmd@drivers/scsi/scsi.c") 189 | { 190 | struct_bio= $cmd->request->bio 191 | block = $cmd->request->sector 192 | req_len = $cmd->request_bufflen 193 | 194 | if($cmd->request->rq_disk != 0) 195 | { 196 | major = $cmd->request->rq_disk->major 197 | minor = $cmd->request->rq_disk->first_minor 198 | } 199 | 200 | if((target_block == -1) && (target_bio == struct_bio) && (target_minor == minor)) 201 | { 202 | target_block = block 203 | } 204 | 205 | if(target_block == -1) 206 | { 207 | if(struct_bio != 0) 208 | { 209 | page = $cmd->request->bio->bi_io_vec->bv_page 210 | if(page != 0) 211 | { 212 | inode = get_inode(page) 213 | } 214 | } 215 | } 216 | 217 | if(((inode_lba_flag ==1)&&(inode == inode_lba_val)) || ((inode_lba_flag ==0 ) && ((block <= inode_lba_val) && (inode_lba_val < block + (req_len >> 9)))) || (block == target_block) || ((block <= target_block) && (target_block < block + (req_len >> 9)))) 218 | { 219 | printf("\nSCSI_DISPATCH_CMD: command= %d \n", $cmd->cmnd[0]) 220 | printf("SCSI_DISPATCH_CMD: major= %d minor= %d \n", major, minor) 221 | printf("SCSI_DISPATCH_CMD: flag(0:LBA, 1:inode)= %d \n", inode_lba_flag) 222 | printf("SCSI_DISPATCH_CMD: start sector= %d \n", $cmd->request->sector) 223 | printf("SCSI_DISPATCH_CMD: req bufflen= %d \n", $cmd->request_bufflen) 224 | printf("SCSI_DISPATCH_CMD: inode= %d \n", inode) 225 | printf("SCSI_DISPATCH_CMD: scmd = %d \n", $cmd) 226 | printf("SCSI_DISPATCH_CMD: [7]=%d [8]=%d \n", $cmd->cmnd[7],$cmd->cmnd[8]) 227 | 228 | if((target_minor== -1) && (major == dev_major) && ((dev_minor_min & 0xfff0) <= minor) && (minor <= (dev_minor_max & 0xfff0))) 229 | { 230 | tmp_minor = minor 231 | } 232 | 233 | if((major == dev_major && ((minor == tmp_minor) || (minor == target_minor))) && (fix_write != 2)) 234 | { 235 | /* inject errors on the designated device */ 236 | printf("SCSI_DISPATCH_CMD: cmd-retries = %d entire-retry =%d \n", $cmd->retries, entire_retries) 237 | cmd_direction = $cmd->sc_data_direction 238 | if((cmd_direction == 1) && (access_type == 4)) 239 | { 240 | fix_write = 2 241 | printf("SCSI_DISPATCH_CMD: fix_write =%d \n", fix_write) 242 | } 243 | 244 | if((temp_failure == 0) || (error_type == 1) ||((timeout_flag == 1) && (entire_retries <= retry_allowed))) 245 | { 246 | if((cmd_direction == access_type) || ((cmd_direction == 2) && (access_type == 4)) || (access_type == 3)) 247 | { 248 | if(target_minor == -1) 249 | { 250 | target_minor = tmp_minor 251 | } 252 | 253 | if(target_block == -1) 254 | { 255 | target_block = block 256 | } 257 | 258 | if(target_access == 0) 259 | { 260 | target_bio = struct_bio 261 | retry_allowed = $cmd->allowed 262 | target_access++ 263 | target_scmd = $cmd 264 | target_rq = $cmd->request 265 | } 266 | 267 | temp_failure++ 268 | 269 | if(($cmd->cmnd[0] == 0x28) || ($cmd->cmnd[0] == 0x2a)) 270 | { 271 | /* read_10 or write_10 */ 272 | $cmd->cmnd[7]=0 273 | $cmd->cmnd[8]=0 274 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 275 | { 276 | /* read_16 or write_16 */ 277 | $cmd->cmnd[10]=0 278 | $cmd->cmnd[11]=0 279 | $cmd->cmnd[12]=0 280 | $cmd->cmnd[13]=0 281 | }else if(($cmd->cmnd[0] == 0x08) || ($cmd->cmnd[0] == 0x0a)) 282 | { 283 | /* read_6 or write_6 */ 284 | $cmd->cmnd[4]=0 285 | } 286 | 287 | if(target_scmd == $cmd) 288 | { 289 | entire_retries++ 290 | } 291 | 292 | if((target_access_t == 0) && (timeout_flag == 1)) 293 | { 294 | target_access_t++ 295 | global_scmd = $cmd 296 | restore_state = $cmd->device->host->shost_state 297 | $cmd->device->host->shost_state = 4 298 | } 299 | } 300 | } 301 | 302 | printf("\nSCSI_DISPATCH_CMD: cmd= %d, allowed = %d retries= %d \n", $cmd, $cmd->allowed, $cmd->retries) 303 | printf("SCSI_DISPATCH_CMD:scsi_cmnd= %d (host,channel,id,lun)= (%d, %d, %d, %d) \n", $cmd, $cmd->device->host->host_no, $cmd->device->channel, $cmd->device->id, $cmd->device->lun) 304 | printf("SCSI_DISPATCH_CMD:execname=%s, pexecname=%s\n", execname(), pexecname()) 305 | } 306 | } 307 | } 308 | 309 | -------------------------------------------------------------------------------- /README.txt: -------------------------------------------------------------------------------- 1 | 2 | The SCSI fault injection test tool 3 | 4 | Revision History: 5 | 6 | rev 1.00 Jan 18 2008 K.Tanaka 7 | - Initial Release. 8 | 9 | 10 | 1. Introduction 11 | ================ 12 | This tool enables to test error handling routines related with the filesystem 13 | and block I/O of the Linux system by injecting a SCSI fault on the system. 14 | 15 | This tool generates "pseudo" faults in the SCSI mid-layer. This could be 16 | a more realistic SCSI device faults simulation. For example, device faults 17 | resulting in scsi command timeout, and media faults which could be corrected by 18 | writing data to the failed sector could be simulated. 19 | User can designate a device currently connected to the system and the target 20 | location within the designated device to inject a fault, so that test program 21 | using this tool could generate an error with a specific access. 22 | 23 | Internally, this tool uses SystemTap. This tool rewrite the status code and 24 | sense data of SCSI command using SytemTap and pass it to the upper layer. 25 | So the real error handling routine of the upper layer for I/O request can be tested. 26 | 27 | This is originally created to test software RAID (md/dm-mirror) 28 | on Linux. But any upper layer app/driver using the SCSI mid-layer can also apply 29 | this tool. 30 | 31 | 32 | 2. Requirements 33 | ================ 34 | - Linux system (x86 architecture) compiled with debuginfo. 35 | - SystemTap (v5.14.1 or later) installed to the Linux system. 36 | 37 | Please see http://sourceware.org/systemtap/index.html for SystemTap. 38 | 39 | 40 | 3. Supported SCSI fault type 41 | ============================ 42 | 43 | This test tool supports to inject the following type of faults. 44 | 45 | 3-1. Permanent read error simulation 46 | This type of fault simulates a permanent media error 47 | on a particular sector. Any read access to the sector fails, 48 | but write will succeed. 49 | 50 | 3-2. Permanent read /write error simulation 51 | This type of fault simulates a severe media error. 52 | Both read and write fails on the particular sector permanently. 53 | 54 | 3-3. Read error correctable by write simulation 55 | This type of fault simulates a media fault which could be 56 | corrected by writing data to the failed sector. After writing 57 | to the sector, subsequent reads and writes will both succeed. 58 | 59 | 3-4. Temporary read error simulation 60 | This type of fault simulates an accidental fault, just once. 61 | 62 | 3-5. Temporary write error simulation 63 | This type of fault simulates an accidental fault, just once. 64 | 65 | 3-6. Temporary no response on a read access simulation 66 | This type of fault simulates a situation,such as a congestion, 67 | resulting in scsi command timeout on a read request. After 68 | the congestion disappears, both read and write will succeed. 69 | 70 | 3-7. Temporary no response on a write access simulation 71 | This type of fault simulates a situation,such as a congestion, 72 | resulting in scsi command timeout on a write request. After 73 | the congestion disappears, both read and write will succeed. 74 | 75 | 3-8. Permanent no response on both read and write access simulation 76 | This type of fault simulates a device fault resulting in scsi 77 | command timeout on a write request. 78 | Both read and write fails on the particular sector permanently. 79 | 80 | 81 | Each fault type can be injected by using SystemTap scripts(*.stp) for each type. 82 | 83 | script name | fault type description 84 | -------------------+------------------------------------------------------- 85 | disk_rerr.stp | permanent read error simulation 86 | disk_rwerr.stp | permanent read /write error simulation 87 | sector_rerr.stp | read error correctable by write simulation 88 | temporary_rerr.stp | temporary read error simulation 89 | temporary_werr.stp | temporary write error simulation 90 | r_timeout.stp | temporary no response on read access simulation 91 | w_timeout.stp | temporary no response on write access simulation 92 | rw_timeout.stp | permanent no response on both read and write access simulation 93 | 94 | disk_rwerr.stp, disk_rerr.stp, sector_rerr.stp, temporary_rerr.stp, temporary_werr.stp 95 | return a "fake" sense data(sense key = 3, ASC = 11, ASCQ = 4) to the upper layer. 96 | This means a medium error. 97 | 98 | The following scripts are also included, but they are for internal use only. 99 | scsi_fault_injection_common.stp - common routine for fault injection 100 | scsi_timeout_injection_common.stp - common routine for timeout injection 101 | 102 | 103 | 4. Usage 104 | ========= 105 | 106 | The flow is as follows. 107 | For more detailed example, see the sample_usage.txt. 108 | 109 | 1. Setup 110 | 1-1 Install SystemTap and kernel with debuginfo. 111 | (For more details, see Appendix A) 112 | 113 | 1-2 Decompress toolset and move them to an arbitrary directory. 114 | 115 | 116 | 2. Decide the type and target of the fault 117 | 118 | 2-1 Fault type 119 | Choose a fault type you want to inject described above. 120 | 121 | 2-2 Target 122 | Choose a target device to inject a fault. 123 | You need to know the (major, minor) number of the target SCSI device. 124 | If you want to inject a fault by accessing particular file, 125 | you need to know the inode number of the file. (e.g. by ls -i command) 126 | Also, if you want to inject a fault by accessing particular 127 | LBA(Logical Block Address) of the target SCSI device, instead 128 | of accessing a file, you need to choose the LBA number. 129 | 130 | NOTE. The file, with which you want to inject a fault may be cached 131 | on memory, but an access to the target device needs to be generated 132 | to inject a fault. For confirmation, you should drop all page 133 | caches by "echo 1 > /proc/sys/vm/drop_caches". 134 | 135 | 136 | 3. Run the stap command 137 | 138 | Run a script to setup a SystemTap hook to cause a fault. Refer to section 139 | "6. Scripts usage details" for more details. 140 | By running a script, the hook for injecting a fault is made. 141 | If SystemTap shows the string "Pass 5: starting run.", it's ready. 142 | 143 | 144 | 4. Access the target device 145 | 146 | Running the SystemTap script is just only to set a trap. So an access to the target 147 | device is needed to inject the fault. 148 | e.g. If you run "disk_rerr.stp" script, you need to read from the target device 149 | You can make a read access by a cat command. 150 | If you run "temporary_werr.stp you need to write to the target device. 151 | You can make a write access by redirection to the file and a sync command. 152 | 153 | 154 | 5. Check the result 155 | 156 | Check the result caused by the fault injection and stop the script. 157 | If the fault is successfully injected, the script prints the message 158 | beginning with "scsi_decide_disposition" or "scsi_add_timer" depending 159 | on the fault type. 160 | The "read/write error response" type of fault scripts (disk_rwerr.stp, 161 | disk_rerr.stp, sector_rerr.stp, temporary_rerr.stp, temporary_werr.stp) 162 | will show "scsi_decide_disposition ..." on successfully injected faults. 163 | The "no response" type of fault scripts (r_timeout.stp, w_timeout.stp, rw_timeout.stp) 164 | will show "scsi_add_timer ..." on successfully injected faults 165 | 166 | The running SystemTap script can't stop automatically. So you need to 167 | stop that by "Ctrl-C" or "pkill stap" command. 168 | 169 | 170 | 5. Scripts usage details 171 | ======================== 172 | 173 | SYNOPSIS 174 | stap