├── dummy.png
├── html
    ├── .gitignore
    └── README.md
├── pdf
    ├── .gitignore
    └── README.md
├── blupsheet.pdf
├── cc88x31.png
├── pd88x31.png
├── installation_start.md
├── replace_date.sh
├── genomic_start.md
├── quicktour_start.md
├── binsol_to_textsol.f90
├── license.md
├── largescale_start.md
├── history.md
├── renum_start.md
├── installation_availability.md
├── NEWS.md
├── acknowledgment.md
├── mrode_start.md
├── README.md
├── mrode_c04ex042_common_environment.md
├── mrode_c12ex121_dominance.md
├── installation_editor.md
├── mrode_c05ex052_mt_missing.md
├── mrode_c03ex032_sire_model.md
├── mrode_c11ex111_fixed_snp.md
├── introduction_plus.md
├── mrode_c11ex115_polygenic.md
├── mrode_c09ex091_fixed_regression.md
├── mrode_c10ex103_qtl.md
├── largescale_issues.md
├── mrode_c05ex053_mt_unequal_design.md
├── renum_norenum.md
├── mrode_c05ex054_mt_no_covariance.md
├── mrode_c04ex041_repeatability_model.md
├── introduction_condition.md
├── installation_linux.md
├── references.md
├── installation_env.md
├── introduction_short.md
├── introduction_about.md
├── mrode_c11ex116_ssgblup.md
├── mrode_c05ex051_mt_equal_design.md
├── mrode_c09ex092_random_regression.md
├── largescale_pcg.md
├── installation_windows.md
├── mrode_c11ex112_mixed_snp.md
├── index.md
├── mrode_c08ex081_social_interaction.md
├── Makefile
├── mrode_c07ex071_maternal.md
├── mrode_c13ex132_threshold_linear.md
├── mrode_c03ex033_reduced_animal_model.md
├── largescale_reliability.md
├── introduction_difference.md
├── mrode_c10ex102_marker_information.md
├── vc_advanced_gs.md
├── genomic_files.md
├── mrode_c12ex123_dominance_inverse.md
├── largescale_reml.md
├── renum_genomic.md
└── renum_mt.md


/dummy.png:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/html/.gitignore:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/pdf/.gitignore:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/blupsheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/masuday/Blupf90TutorialStandard/HEAD/blupsheet.pdf


--------------------------------------------------------------------------------
/cc88x31.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/masuday/Blupf90TutorialStandard/HEAD/cc88x31.png


--------------------------------------------------------------------------------
/pd88x31.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/masuday/Blupf90TutorialStandard/HEAD/pd88x31.png


--------------------------------------------------------------------------------
/html/README.md:
--------------------------------------------------------------------------------
1 | The HTML file is available at the separate page.
2 | 
3 | - <https://masuday.github.io/blupf90_tutorial/index.html>
4 | 


--------------------------------------------------------------------------------
/pdf/README.md:
--------------------------------------------------------------------------------
1 | The pdf file is available at the release page.
2 | 
3 | - <https://github.com/masuday/Blupf90TutorialStandard/releases>
4 | 
5 | Or, you can get the file at the wiki at the UGA website.
6 | 
7 | - <http://nce.ads.uga.edu/wiki/doku.php?id=documentation>
8 | 


--------------------------------------------------------------------------------
/installation_start.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Download and Installation
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | This chapter covers how to install and run the programs. We also introduce methods to save screen messages (output logs) and omit the need to type input at the beginning of a program. A text editor is helpful for working with the files.
10 | 


--------------------------------------------------------------------------------
/replace_date.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | 
 3 | if [ -z "$1" ]; then
 4 |    echo "usage: $0 date"
 5 |    exit 1
 6 | fi
 7 | 
 8 | # the current date
 9 | date=$1
10 | 
11 | # for the Markdown files
12 | for mdfile in *.md; do
13 |    echo $mdfile
14 |    perl -i.bak -pe "s/^date: [A-Za-z, ]*[0-9]*$/date: $date/" $mdfile
15 | done
16 | 
17 | # for the LaTeX file
18 | echo tutorial_blupf90.tex
19 | perl -i.bak -pe "s/^\\\\date.*$/\\\\date{$date}/" tutorial_blupf90.tex
20 | 


--------------------------------------------------------------------------------
/genomic_start.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Practical genomic analysis
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | In genomic prediction,  the results are often affected by the quality control of SNP markers and the fine-tuning of the genomic relationship matrix. BLUPF90 programs support plenty of options to stabilize the analysis. Here we will explain the additional features implemented in the programs.
10 | 


--------------------------------------------------------------------------------
/quicktour_start.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Quick tour of BLUPF90
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | In this chapter, we perform several analyses using simple linear models with BLUPF90. The program creates and solves mixed model equations. These small examples will give you a basic understanding of how to prepare a dataset, write a parameter file, and run the program. Even if you are interested in other software or more complex models, we recommend starting with this chapter.
10 | 


--------------------------------------------------------------------------------
/binsol_to_textsol.f90:
--------------------------------------------------------------------------------
 1 | program binsol_to_textsol
 2 |    implicit none
 3 |    integer :: io, t, e, l
 4 |    double precision :: v, sol, se
 5 |    open(10, file='binary_final_solutions', form='unformatted', &
 6 |             status='old', iostat=io)
 7 |    if(io /= 0) stop
 8 |    open(20, file='final_solutions.txt')
 9 |    write(20,'(" trait / effect level solution               s.e.")')
10 |    do
11 |       read(10, iostat=io) t,e,l,sol,se
12 |       if(io /= 0) exit
13 |       write(20, '(2i4,i10,2f20.8)') t,e,l,sol,se
14 |    end do
15 |    close(10)
16 |    close(20)
17 | end program binsol_to_textsol
18 | 


--------------------------------------------------------------------------------
/license.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ### Main text
 3 | 
 4 | ![Creative Commons License](cc88x31.png)\
 5 | 
 6 | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC-BY-NC-ND). Please check the Creative Commons website, [https://creativecommons.org/licenses/by-nc-nd/4.0/](https://creativecommons.org/licenses/by-nc-nd/4.0/), for details.
 7 | 
 8 | ### Numerical examples
 9 | 
10 | ![Creative Commons License](pd88x31.png)\
11 | 
12 | The files of numerical examples available at a Github repository, [https://github.com/masuday/data/tree/master/tutorial](https://github.com/masuday/data/tree/master/tutorial), are released into the public domain. See the [license documentation](https://github.com/masuday/data/blob/master/LICENSE) for details.
13 | 


--------------------------------------------------------------------------------
/largescale_start.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Large-scale genetic evaluation
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Actual genetic evaluation often handles a large data set. Although the specific program `blupf90` can read quite large data and pedigree files ( for instance 1 million animals depending on the available memory), it will fail in a large-scale analysis with tens of millions of individuals. Some special software is available for such a purpose with a special contract with the development team at UGA. A member of our team can access and test the software supporting the large data set. In this chapter, we will explain the usage of the programs for large-scale genetic evaluations.
10 | 


--------------------------------------------------------------------------------
/history.md:
--------------------------------------------------------------------------------
1 | - March 2018 (0.8.0): The first revision prepared for the summer course at UGA. Thanks to Andrés Legarra.
2 | - April 2019 (0.9.0): Revised to correct some errors and typos.
3 | - May 2019 (0.9.1): Minor update.
4 | - September 5, 2019 (1.0.0): Revised to make a massive correction on errors and typos, to improve many sentences in readability, and to add new paragraphs.
5 | - September 22, 2019 (1.0.1): Revised to fix some typos and errors. Updated the example for ssGWAS.
6 | - April 30, 2025 (1.1.0): Intensively revised and proofread. Changed paragraph format for newer `pandoc` versions. Added relevant descriptions and removed outdated information. Updated explanations regarding the new programs (`blupf90+` and `gibbsf90+`). Full support for the new programs will be provided in the next edition.
7 | 


--------------------------------------------------------------------------------
/renum_start.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Data preparation with RENUMF90
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | This chapter provides an instruction on RENUMF90 to prepare the files for data, pedigree, genomic markers, and parameters. 
10 | BLUPF90 requires several files with a particular format which the "raw" data may not have. RENUMF90 can check the raw files and convert them to new files suitable for the BLUPF90 suite. We will start with a minimal example as before.
11 | 
12 | More examples are available at the authors's Github repository [https://github.com/masuday/data](https://github.com/masuday/data). You will find example files with more complicated models including random regressions, maternal effects, a model with unknown parent groups (UPGs), and GBLUP models.
13 | 


--------------------------------------------------------------------------------
/installation_availability.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Download and Installation
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Availability
10 | ============
11 | 
12 | The BLUPF90 family of programs is available on the official website of the Animal Breeding and Genetics Group at the University of Georgia. This website provides the latest versions of the programs. See the following link:
13 | 
14 | - Animal Breeding and Genetics Group — University of Georgia (<http://nce.ads.uga.edu/>)
15 | 
16 | You can download the programs and use them for personal or academic purposes. Some programs are not available online. These programs are provided under a contract with our team. See the official website for details.
17 | 
18 | The official manual for the program suite is now available on the website. The manual provides detailed descriptions of the available options as well as demonstrations of analyses using various linear models.
19 | 


--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
 1 | News
 2 | ====
 3 | 
 4 | Version 1.1.0: Updates from version 1.0.1
 5 | -----------------------------------------
 6 | 
 7 | - Intensively revised and proofread.
 8 | - Added relevant descriptions and removed outdated information.
 9 | - Updated explanations regarding the new programs, `blupf90+` and `gibbsf90+`.
10 | - Changed the single-quotation code. (`binsol_to_textsol.f90`)
11 | - Updated the support hub from Yahoo Groups to Groups.io. (`introduction_condition.md`)
12 | - Added a notation that the heterogeneous residual variances will not work with the `EM-REML` option in AIREMLF90. (`vc_advanced_aireml.md`)
13 | 
14 | Version 1.0.1: updates from version 1.0.0
15 | -----------------------------------------
16 | 
17 | - Added a link to the author's Github repository to have example files for RENUMF90. (`renum_start.md`)
18 | - Added a link to `index.html`. (`Github.html5.txt`)
19 | - Fixed a typo. (`index.md`)
20 | - Fixed the description of a pipeline in ssGWAS. Fixed the sample package for ssGWAS. (`genomic_gwas.md`)
21 | 
22 | 


--------------------------------------------------------------------------------
/acknowledgment.md:
--------------------------------------------------------------------------------
1 | I would like to express my acknowledgment to faculty, postdocs, students, visitors in the Department of Animal and Dairy Science at the University of Georgia, and the users and the developers of the software for their questions about the BLUPF90 programs, which encourage me to decide to write this note.
2 | 
3 | While I was writing the initial version of this tutorial, Andrés Legarra reviewed the draft and made a massive amount of corrections and improvements on it. He also wrote sections *Renumbering the data without RENUMF90*, *EM-REML algorithm*, *Likelihood ratio test*, and several other subsections.
4 | 
5 | I thank Ignacy Misztal for his development of the wonderful software, Shogo Tsuruta, Ignacio Aguilar, and Andrés Legarra for the discussion on their implementation of algorithms in BLUPF90, Luis Varona, Benoit Auvray, Tomasz Strabel, Tom Druet, Deuk-Hwan Lee, Jesus Arango, Juan Pablo Sánchez, Miguel Pérez-Enciso, and François Guillaume for their contribution to the software development, Daniela Lourenco for the discussion on the design of BLUPF90 programs and her useful comments to improve this tutorial, and Ivan Pocrnić for his comments on the draft of this tutorial.
6 | 


--------------------------------------------------------------------------------
/mrode_start.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Various models shown in Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | In this chapter, we will see how BLUPF90 applies to a variety of models. We will look at various models in a standard textbook, *Linear Models for the Prediction of Animal Breeding Values*, written by Mrode (2014). We explain how to write a parameter file to handle a model in his book. We will not represent the content of data, pedigree, and genotype files. You can easily create these files if you have the book. We do not only solve the equations but do introduce new options and tricks useful for actual data analyses.
10 | 
11 | As mentioned in the Introduction, the textbook has been updated (now in its 4th edition), and a new author, Ivan Pocrnić, has joined. The title of the textbook has also been changed to "Linear Models for the Prediction of the Genetic Merit of Animals". This tutorial is based on the previous edition (3rd edition, published in 2014), and a full update of the tutorial will be made in the future. The numerical examples in the new textbook are (mostly) the same as before, so I believe this tutorial is still useful with the new edition, except for differences in chapter structure.
12 | 
13 | In addition, the computer programs have been updated to `blupf90+`. The parameter file format remains the same, so you can continue to use the previous parameter files with `blupf90+` without modification.
14 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | Introduction to BLUPF90 suite programs
 2 | ======================================
 3 | 
 4 | Written by Yutaka Masuda
 5 | 
 6 | Quick links
 7 | -----------
 8 | 
 9 | - [Index](./index.md)
10 | - [PDF documentation](https://github.com/masuday/Blupf90TutorialStandard/releases)
11 | - [HTML documentation](https://masuday.github.io/blupf90_tutorial/index.html)
12 | - [Numerical examples](https://github.com/masuday/data/tree/master/tutorial)
13 | 
14 | How can I build the documentation?
15 | ----------------------------------
16 | 
17 | You need the following tools.
18 | 
19 | - [Pandoc](https://pandoc.org/) v2.5.0 or later
20 | - pdfLaTeX
21 | - Make and Bash
22 | 
23 | Simply type the `make` command in the repository on Bash, and the programs generate a PDF file in `pdf/` and HTML files in `html/`.
24 | 
25 | ~~~~~
26 | make
27 | ~~~~~
28 | 
29 | To cleanup, run `make` with the argument.
30 | 
31 | ~~~~~
32 | make clean
33 | ~~~~~
34 | 
35 | How can I modify the note?
36 | --------------------------
37 | 
38 | ### Simple way
39 | 
40 | 1. Make a fork of this repository in your account.
41 | 2. Modify the file online; select the file, edit it, and commit it.
42 | 3. After the edit, make a _pull request_.
43 | 4. Wait the response.
44 | 
45 | ### Another way
46 | 
47 | 1. Make a branch.
48 | 2. Modify the files and commit it.
49 | 3. Make a pull request.
50 | 
51 | Where can I get the numerical examples in this tutorials?
52 | ---------------------------------------------------------
53 | 
54 | The numerical examples and the files used in the tutorial are available at my Github repository. All the files in the repository are released in the public domain.
55 | 
56 | - <https://github.com/masuday/data/tree/master/tutorial>
57 | 


--------------------------------------------------------------------------------
/mrode_c04ex042_common_environment.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Numerical examples from Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Common environmental effects as random
10 | ======================================
11 | 
12 | Model
13 | -----
14 | 
15 | In this example, the author considers a simple maternal model and the maternal effects ($c$) do not correlate to
16 | each other. The variance components include the animal genetic variance $\sigma_u^2=20$, the maternal variance $\sigma_c^2=15$, and the residual variance $\sigma_e^2=65$.
17 | 
18 | Files
19 | -----
20 | 
21 | The data file (`data_mr04b.txt`) contains the whole table shown in the textbook (p.68). Here is the explanation for each column.
22 | 
23 | 1. Animal ID (piglet)
24 | 2. Sire ID
25 | 3. Dam ID
26 | 4. Sex (1=male and 2=female)
27 | 5. Weaning weight (kg)
28 | 
29 | Pedigree (`pedigree_mr04b.txt`) can be derived from the above data.
30 | 
31 | The parameter file should contain 2 random effects.
32 | 
33 | ~~~~~{language=blupf90 caption="param_mr04b.txt"}
34 | DATAFILE
35 | data_mr04b.txt
36 | NUMBER_OF_TRAITS
37 | 1
38 | NUMBER_OF_EFFECTS
39 | 3
40 | OBSERVATION(S)
41 | 5
42 | WEIGHT(S)
43 | 
44 | EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT  [EFFECT NESTED]
45 | 4  2 cross
46 | 1 15 cross
47 | 3  5 cross    # for maternal environmental effect
48 | RANDOM_RESIDUAL VALUES
49 | 65.0
50 | RANDOM_GROUP
51 | 2
52 | RANDOM_TYPE
53 | add_animal
54 | FILE
55 | pedigree_mr04b.txt
56 | (CO)VARIANCES
57 | 20.0
58 | RANDOM_GROUP
59 | 3
60 | RANDOM_TYPE
61 | diagonal
62 | FILE
63 | 
64 | (CO)VARIANCES
65 | 15.0
66 | OPTION solv_method FSPAK
67 | ~~~~~
68 | 
69 | Solutions
70 | ---------
71 | 
72 | The results are identical to the textbook.
73 | 


--------------------------------------------------------------------------------
/mrode_c12ex121_dominance.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Numerical examples from Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Animal model with dominance effect
10 | ==================================
11 | 
12 | Model
13 | -----
14 | 
15 | Linear mixed models can handle a non-additive genetic model with dominance or epistatic relationship matrices. In the dominance model, we can consider the additive genetic effect and the dominance effect simultaneously. In this example, the author assumes the animal model with one fixed effect, the additive genetic effect, the dominance effect, and the residual effect. In this example, the additive variance $\sigma_u^2 = 90$, the dominance variance $\sigma_d^2 = 80$, and the residual variance $\sigma_e^2 = 120$. BLUPF90 does not have a function to calculate $\mathbf{D}^{-1}$, so we need to supply the elements as a file to the program.
16 | 
17 | Files
18 | -----
19 | 
20 | A data file (`data_mr12a.txt`) should be prepared. The pedigree file (`pedigree_mr12a.txt`) is also created.
21 | 
22 | The inverse of the dominance relationship matrix is supplied as a text file. See the textbook for details (p.207).
23 | 
24 | ~~~~~{language=text caption="userinverse_mr12a.txt"}
25 |  1  1  1.000
26 |  2  2  1.000
27 | ...
28 | 11 12 -0.241
29 | 12 12  1.092
30 | ~~~~~
31 | 
32 | The parameter file defines 2 random effects.
33 | 
34 | ~~~~~{language=blupf90 caption="param_mr12a.txt"}
35 | DATAFILE
36 | data_mr12a.txt
37 | NUMBER_OF_TRAITS
38 | 1
39 | NUMBER_OF_EFFECTS
40 | 3
41 | OBSERVATION(S)
42 | 5
43 | WEIGHT(S)
44 | 
45 | EFFECTS:
46 | 4  2 cross # fixed effect
47 | 1 12 cross # additive effect
48 | 1 12 cross # dominance effect
49 | RANDOM_RESIDUAL VALUES
50 | 120.0
51 | RANDOM_GROUP
52 | 2
53 | RANDOM_TYPE
54 | add_animal
55 | FILE
56 | pedigree_mr12a.txt
57 | (CO)VARIANCES
58 | 90.0
59 | RANDOM_GROUP
60 | 3
61 | RANDOM_TYPE
62 | user_file
63 | FILE
64 | userinverse_mr12a.txt
65 | (CO)VARIANCES
66 | 80.0
67 | OPTION solv_method FSPAK
68 | ~~~~~
69 | 
70 | Solutions
71 | ---------
72 | 
73 | The solutions are identical to the reference values in the textbook (p.207).
74 | 


--------------------------------------------------------------------------------
/installation_editor.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Download and Installation
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Setup a text editor
10 | ===================
11 | 
12 | ### What is a text editor?
13 | 
14 | BLUPF90 programs use various text files. A text file contains only characters (basic alphabet, numbers, and symbols). While Microsoft Word can open text files, it is too heavy and not suitable for editing them. A text editor is a type of software specialized for editing text files. Text editors usually offer many features that are useful for editing, and some can open very large files.
15 | 
16 | ### Well-known text editors
17 | 
18 | Common text editors include *Notepad* (Windows), *TextEdit* (macOS), and *vi*, *Emacs*, or *nano* (Linux/Unix). These are the default editors on each system, but you can install other editors for convenience. Notepad, in particular, has too few features to be practical in many cases. There are many free, powerful text editors available for Windows and other platforms.
19 | 
20 | Below are several well-known and free text editors. Some of them are multi-platform and support Linux, macOS, and Windows. Try a few and choose the one that suits you best:
21 | 
22 | - Notepad++ (<https://notepad-plus-plus.org>) — Windows only  
23 | - SciTE (<http://www.scintilla.org/SciTE.html>) — Windows  
24 | - Visual Studio Code (<https://code.visualstudio.com>) — All platforms
25 | 
26 | There are also comparison lists of text editors maintained online. For example, the Wikipedia article (<https://en.wikipedia.org/wiki/Comparison_of_text_editors>) may help you find your favorite. Lastly, if you prefer very traditional, professional editors, consider using `vi` or `Emacs`.
27 | 
28 | ### Editing files over a network
29 | 
30 | You may need to run the programs on a remote server (e.g., accessing a Linux machine over the internet). In such cases, there are several ways to edit text files. Here, we assume the remote server is running Linux (or a Unix-like environment).
31 | 
32 | 1. **Use a text-based editor on the server** (e.g., `vi` or `Emacs`).  
33 |    You edit the file directly in the terminal window.
34 | 
35 | 2. **Use a graphical editor on the server with X forwarding** (e.g., GEdit).  
36 |    The editor appears to run on your local machine but is actually running on the server. It can directly read and write files on the server. A guide is available here: <http://pdc-amd01.poly.edu/Xhowto.html>
37 | 
38 | 3. **Use a local editor that can access remote files**.  
39 |    Some advanced editors (such as `Visual Studio Code`) support editing files over SSH or SFTP.
40 | 


--------------------------------------------------------------------------------
/mrode_c05ex052_mt_missing.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Numerical examples from Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Multiple-trait model with equal design matrix and missing records
10 | =================================================================
11 | 
12 | Model
13 | -----
14 | 
15 | Here we apply the same model described in the previous example to the data with missing observations. Model descriptions and mixed model equations are identical as before.
16 | 
17 | With a missing observation, $\mathbf{R}_0$ and its inverse should be altered. For example, assuming a 2 trait model, if the observation of the first trait is missing, the first row and column in $\mathbf{R}_0$ should be zeroed out. The corresponding inverse is the generalized inverse of this altered $\mathbf{R}_0$. Illustrating this situation with the previous example, the result is
18 | $$
19 | \mathbf{R}_{0}
20 | =
21 | \left[
22 | \begin{array}{rr}
23 | 0&0\\
24 | 0&30
25 | \end{array}
26 | \right]
27 | \quad
28 | \text{and}
29 | \quad
30 | \mathbf{R}_{0}^{-}
31 | =
32 | \left[
33 | \begin{array}{rr}
34 | 0&0\\
35 | 0&30
36 | \end{array}
37 | \right]^{-}
38 | =
39 | \left[
40 | \begin{array}{rr}
41 | 0&0\\
42 | 0&1/30
43 | \end{array}
44 | \right].
45 | $$
46 | 
47 | The generalized inverse of this zeroed matrix is equivalent to the inverse of a matrix containing only nonzero elements in the zeroed matrix (Searle, 1971). BLUPF90 can detect a missing observation and prepares an appropriate $\mathbf{R}_{0}$ and its generalized inverse.
48 | 
49 | 
50 | Files
51 | -----
52 | 
53 | One animal is added to the previous example and 2 observations are marked as missing. The missing observation is indicated as 0, which is the default missing code used in the BLUPF90 family (`data_mr05b.txt`). We can use an extended pedigree file as the previous one by adding the animal 9 (`pedigree_mr05b.txt`).
54 | 
55 | The parameter file is also identical except for omitting an option for standard error calculations.
56 | 
57 | ~~~~~{language=blupf90 caption="param_mr05b.txt"}
58 | DATAFILE
59 | data_mr05b.txt
60 | NUMBER_OF_TRAITS
61 | 2
62 | NUMBER_OF_EFFECTS
63 | 2
64 | OBSERVATION(S)
65 | 5 6
66 | WEIGHT(S)
67 | 
68 | EFFECTS:
69 | 2 2 2 cross
70 | 1 1 9 cross
71 | RANDOM_RESIDUAL VALUES
72 | 40.0 11.0
73 | 11.0 30.0
74 | RANDOM_GROUP
75 | 2
76 | RANDOM_TYPE
77 | add_animal
78 | FILE
79 | pedigree_mr05b.txt
80 | (CO)VARIANCES
81 | 20.0 18.0
82 | 18.0 40.0
83 | OPTION solv_method FSPAK
84 | ~~~~~
85 | 
86 | Solutions
87 | ---------
88 | 
89 | You can confirm the results are identical to the values in the textbook (p.80).
90 | 


--------------------------------------------------------------------------------
/mrode_c03ex032_sire_model.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Numerical examples from Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Sire model
10 | ==========
11 | 
12 | Model
13 | -----
14 | 
15 | Here we apply a sire model to the previous data. The mathematical model is the same as the previous section but the individual genetic effect is replaced with the sire genetic effect. In this model, we consider as if an observation belongs to the sire of the animal rather than to the animal itself. Additive genetic variation is explained by sires. So, the sire variance is $\sigma_s^2 = 0.25\sigma_u^2 = 5$ and the residual variance absorbs the remaining part of genetic variance and $\sigma_e^{2*}=0.75\sigma_u^2+\sigma_e^2=55$. The phenotypic variance is the same (60) between the animal and sire models.
16 | 
17 | Files
18 | -----
19 | 
20 | We can use the same data set shown in the previous section (`data_mr03b.txt`). In this case, we use the 3rd column (sire ID) instead of the 1st column (animal ID) as an animal's effect.
21 | 
22 | The pedigree file in sire model for BLUPF90 is different from the one used in the animal model. This file (`pedigree_mr03b.txt`) contains 3 columns:
23 | 
24 | 1. The ID for sire.
25 | 2. The ID for sire of the sire.
26 | 3. The ID for maternal grandsire (MGS) of the sire.
27 | 
28 | In this case, all MGS are unknown so the 3rd column should be 0 for all animals.
29 | 
30 | The parameter file is also altered. We put comments around lines to be changed.
31 | 
32 | ~~~~~{language=blupf90 caption="param_mr03b.txt"}
33 | DATAFILE
34 | data_mr03b.txt
35 | NUMBER_OF_TRAITS
36 | 1
37 | NUMBER_OF_EFFECTS
38 | 2
39 | OBSERVATION(S)
40 | 5
41 | WEIGHT(S)
42 | 
43 | EFFECTS:
44 | 2 2 cross
45 | 3 4 cross               # 3rd column = sire effect; 4 sires
46 | RANDOM_RESIDUAL VALUES  # residual variance
47 | 55.0
48 | RANDOM_GROUP
49 | 2
50 | RANDOM_TYPE             # type changed
51 | add_sire
52 | FILE
53 | pedigree_mr03b.txt
54 | (CO)VARIANCES           # sire variance
55 | 5.0
56 | OPTION solv_method FSPAK
57 | ~~~~~
58 | 
59 | The value `add_sire` is a keyword for the sire model in BLUPF90. The pedigree file does not have sire 2 because it is missing, but the program does not care about it.
60 | 
61 | Solutions
62 | ---------
63 | 
64 | The solutions are following.
65 | 
66 | ~~~~~{language=text caption="solutions"}
67 | trait/effect level  solution
68 |    1   1         1          4.33567107
69 |    1   1         2          3.38198579
70 |    1   2         1          0.02200220
71 |    1   2         2          0.00000000
72 |    1   2         3          0.01402640
73 |    1   2         4         -0.04304180
74 | ~~~~~
75 | 
76 | The solution for sire ID 2 is 0.0. BLUPF90 always produces the solution 0.0 for such a missing level in data and pedigree files. Otherwise, the solutions are identical to the textbook (pp.48).
77 | 


--------------------------------------------------------------------------------
/mrode_c11ex111_fixed_snp.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Numerical examples from Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Fixed effect model for SNP effects
10 | ==================================
11 | 
12 | Model
13 | -----
14 | 
15 | In this section, the author tries to estimate individual SNP effect as a fixed effect based on 8 reference animals that have both genotype and phenotype (fat DYD). Only 3 SNP markers are considered in this small example. We also consider the regular additive genetic (polygenic) effects. The model is
16 | $$
17 | y_{i} = \mu + \sum_{k=1}^{3}z_{ik} g_{k} + u_{i} + e_{i}
18 | $$
19 | where $y_i$ is an observation for animal $i$, $\mu$ is the general mean, $z_{ik}$ is $k$-th weighted marker genotype for the animal that is the $(i,k)$ element in the $\mathbf{Z}$ matrix (see VanRaden, 2008), $g_k$ is the $k$-th fixed SNP effect, $u_i$ is the additive genetic (polygenic) effect, and $e_i$ is the residual effect. We can consider more animals in pedigree. The variance components are $\sigma_u^2= 35.241$ and $\sigma_e^2 = 245.0$.
20 | 
21 | 
22 | Files
23 | -----
24 | 
25 | In this model, the elements of $\mathbf{Z}$ are used as covariates so they should be saved in the data file. The data file contains the first 3 columns of $\mathbf{Z}$; see p.181 in the textbook. Also, the weight used in this analysis is actually the inverse of EDC so the values should be calculated and stored in the data file.
26 | 
27 | ~~~~~{language=text caption="data_mr11a.txt"}
28 |  13  0  0  1 558  9.0  0.00179211   1.357 -0.357  0.286
29 | ...
30 | ~~~~~
31 | 
32 | The data file (`data_mr11a.txt`) has 10 columns.
33 | 
34 | 1. Animal ID
35 | 2. Sire ID
36 | 3. Dam ID
37 | 4. General mean
38 | 5. EDC
39 | 6. Phenotype (Fat DYD)
40 | 7. Weight = inverse of EDC
41 | 8. Covariate for SNP 1
42 | 9. Covariate for SNP 2
43 | 10. Covariate for SNP 3
44 | 
45 | The pedigree file includes all 26 animals. Animal 1 to 12 have missing parents.
46 | 
47 | The parameter file for the weighted analysis is shown below.
48 | 
49 | ~~~~~{language=blupf90 caption="param_mr11a.txt"}
50 | DATAFILE
51 | data_mr11a.txt
52 | NUMBER_OF_TRAITS
53 | 1
54 | NUMBER_OF_EFFECTS
55 | 5
56 | OBSERVATION(S)
57 | 6
58 | WEIGHT(S)
59 | 7
60 | EFFECTS:
61 |  4  1 cross   # general mean
62 |  8  1 cov     # SNP effect 1
63 |  9  1 cov     # SNP effect 2
64 | 10  1 cov     # SNP effect 3
65 |  1 26 cross   # additive genetic ( polygenic ) effect
66 | RANDOM_RESIDUAL VALUES
67 | 245.0
68 | RANDOM_GROUP
69 | 5
70 | RANDOM_TYPE
71 | add_animal
72 | FILE
73 | pedigree_mr11a.txt
74 | (CO)VARIANCES
75 | 35.241
76 | OPTION solv_method FSPAK
77 | ~~~~~
78 | 
79 | If you conduct the unweighted analysis, remove 7 from the WEIGHT(S) section.
80 | 
81 | 
82 | Solutions
83 | ---------
84 | 
85 | In the weighted analysis, the solutions are the same as the textbook (p.181). The estimate of the general mean is different but I believe this is a typo in the textbook.
86 | 


--------------------------------------------------------------------------------
/introduction_plus.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Introduction
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Update to BLUPF90+/GIBBSF90+ programs
10 | =====================================
11 | 
12 | Since 2022, the BLUPF90 programs have been reorganized, and some programs have been unified. This tutorial is still useful for the new software because the updated versions retain the same functionalities as the previous ones. Here, I would like to briefly outline the differences between the old and new programs. For more details, please refer to the WCGALP proceedings (Lourenco et al., 2022).
13 | 
14 | BLUPF90+
15 | --------
16 | 
17 | - The programs `blupf90`, `remlf90`, and `airemlf90` have been unified into `blupf90+`.
18 |     - The default behavior of `blupf90+` is the same as that of `blupf90`, i.e., building and solving the system of mixed model equations using data and variance components specified in the user-supplied parameter file.
19 |     - The `blupf90+` program can estimate variance components by REML using the option `OPTION method VCE` in the parameter file.
20 |     - You can switch the algorithm from AI (Average Information) to EM (Expectation-Maximization) by using the option `OPTION EM-REML`.
21 | - `blupf90+` now supports more options to improve usability.
22 | - The UGA group has discontinued support and development of the old programs (`blupf90`, `airemlf90`, and `remlf90`) because `blupf90+` is compatible with and can fully replace them.
23 | 
24 | GIBBSF90+
25 | ---------
26 | 
27 | - The Gibbs-sampling programs, including `gibbs2f90`, `gibbs3f90`, `thrgibbs1f90`, and others, have been unified into `gibbsf90+`.
28 |     - The default behavior of `gibbsf90+` is the same as that of `thrgibbs1f90`, i.e., estimating variance components via Gibbs sampling under a linear model (or a threshold model with an option).
29 |     - The usage and options are the same as those in `thrgibbs1f90` or other Gibbs-sampling programs.
30 | - The UGA group has discontinued support and development of the old Gibbs-sampling programs because `gibbsf90+` is compatible with and can fully replace them.
31 | 
32 | This tutorial
33 | -------------
34 | 
35 | The descriptions in this tutorial are based on the old programs, but most of the content should still be applicable to the new ones. In the text, readers can simply replace the old program names with the new ones as shown below.
36 | 
37 | | Old program in the text | New program name | Additional option         |
38 | |-------------------------|------------------|----------------------------|
39 | | `blupf90`               | `blupf90+`       | none                       |
40 | | `airemlf90`             | `blupf90+`       | `OPTION method VCE`       |
41 | | `remlf90`               | `blupf90+`       | `OPTION method VCE`       |
42 | |                         |                  | `OPTION EM-REML 5000`     |
43 | | Gibbs sampling programs | `gibbsf90+`      | none                       |
44 | 


--------------------------------------------------------------------------------
/mrode_c11ex115_polygenic.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Mixed linear models with polygenic effects
 10 | ==========================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | See the textbook for the details.
 16 | 
 17 | Files and solutions for SNP-BLUP
 18 | --------------------------------
 19 | 
 20 | First, we perform the SNP-BLUP analysis. The data file (`data_mr11d1.txt`) is from the previous example.
 21 | 
 22 | We can use the same pedigree defined before (`pedigree_mr11d1.txt`).
 23 | 
 24 | The parameter file contains 10 SNP effects with the residual polygenic effects.
 25 | 
 26 | ~~~~~{language=blupf90 caption="param_mr11d1.txt"}
 27 | DATAFILE
 28 | data_mr11d1.txt
 29 | NUMBER_OF_TRAITS
 30 | 1
 31 | NUMBER_OF_EFFECTS
 32 | 12
 33 | OBSERVATION(S)
 34 | 6
 35 | WEIGHT(S)
 36 | 
 37 | EFFECTS:
 38 |  4  1 cross   # general mean
 39 |  1 26 cross   # residual polygenic effects
 40 |  8  1 cov     # SNP effect 1
 41 |  9  1 cov     # SNP effect 2
 42 | 10  1 cov     # SNP effect 3
 43 | 11  1 cov     # SNP effect 4
 44 | 12  1 cov     # SNP effect 5
 45 | 13  1 cov     # SNP effect 6
 46 | 14  1 cov     # SNP effect 7
 47 | 15  1 cov     # SNP effect 8
 48 | 16  1 cov     # SNP effect 9
 49 | 17  1 cov     # SNP effect 10
 50 | RANDOM_RESIDUAL VALUES
 51 | 245.0
 52 | RANDOM_GROUP  # polygenic effect
 53 | 2
 54 | RANDOM_TYPE
 55 | add_animal
 56 | FILE
 57 | pedigree_mr11d1.txt
 58 | (CO)VARIANCES
 59 | 3.5241
 60 | RANDOM_GROUP  # jointly considering independent 10 SNP effects
 61 | 3 4 5 6 7 8 9 10 11 12
 62 | RANDOM_TYPE
 63 | diagonal
 64 | FILE
 65 | 
 66 | (CO)VARIANCES
 67 | 8.9636 0 0 0 0 0 0 0 0 0
 68 | 0 8.9636 0 0 0 0 0 0 0 0
 69 | 0 0 8.9636 0 0 0 0 0 0 0
 70 | 0 0 0 8.9636 0 0 0 0 0 0
 71 | 0 0 0 0 8.9636 0 0 0 0 0
 72 | 0 0 0 0 0 8.9636 0 0 0 0
 73 | 0 0 0 0 0 0 8.9636 0 0 0
 74 | 0 0 0 0 0 0 0 8.9636 0 0
 75 | 0 0 0 0 0 0 0 0 8.9636 0
 76 | 0 0 0 0 0 0 0 0 0 8.9636
 77 | OPTION solv_method FSPAK
 78 | ~~~~~
 79 | 
 80 | You can check the solutions. See the reference values in the textbook (p.190).
 81 | 
 82 | Files and solutions for GBLUP
 83 | -----------------------------
 84 | 
 85 | In this example with GBLUP, we use a text file for $\mathbf{G}_{w}^{-1}$ created with PREGSF90 in the previous section (the first approach). See the instruction and prepare the file.
 86 | 
 87 | The data file (`data_mr11d2.txt`) is also common to the previous one.
 88 | 
 89 | The pedigree file is the same as SNP-BLUP.
 90 | 
 91 | The parameter file is shown below.
 92 | 
 93 | ~~~~~{language=blupf90 caption="param_mr11d2.txt"}
 94 | DATAFILE
 95 | data_mr11d2.txt
 96 | NUMBER_OF_TRAITS
 97 | 1
 98 | NUMBER_OF_EFFECTS
 99 | 3
100 | OBSERVATION(S)
101 | 6
102 | WEIGHT(S)
103 | 
104 | EFFECTS:
105 | 4  1 cross
106 | 1 26 cross  # residual polygenic effect
107 | 8 14 cross  # new ID (renumbered only for genotyped animals)
108 | RANDOM_RESIDUAL VALUES
109 | 245.0
110 | RANDOM_GROUP
111 | 2
112 | RANDOM_TYPE
113 | add_animal
114 | FILE
115 | pedigree_mr11d2.txt
116 | (CO)VARIANCES
117 | 3.5241
118 | RANDOM_GROUP
119 | 3
120 | RANDOM_TYPE
121 | user_file
122 | FILE
123 | Gi
124 | (CO)VARIANCES
125 | 31.717
126 | OPTION solv_method FSPAK
127 | ~~~~~
128 | 
129 | The solutions are available by running the program.
130 | 


--------------------------------------------------------------------------------
/mrode_c09ex091_fixed_regression.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Numerical examples from Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Fixed regression model
10 | ======================
11 | 
12 | Model
13 | -----
14 | 
15 | When repeated measurements are considered as the same traits over time, a repeatability model can be suitable to apply. The assumption in this situation is that the genetic correlation between observations measured at any two time-points is 1. Even if the assumption is correct, the actual observations would have an average curve by a group over time. This non-genetic, systematic change can be modeled by fixed regression curves.
16 | 
17 | The author shows the test day measurements of fat yield in dairy cattle. A repeatability model with a fixed regression curve is assumed. See the textbook for details for the details. The variance components are $\sigma_u^2 =5.521$, $\sigma_p^2 = 8.470$, and $\sigma_e^2 = 3.710$.
18 | 
19 | Usually, fixed regressions are nested within an environmental class. It models the fact that an average trajectory can be differentiated depending on the specific environment nested within, for instance, herd, region, season, age, parity, and so on. This example is small, so the nested fixed regressions are not considered.
20 | 
21 | 
22 | Files
23 | -----
24 | 
25 | The author shows the data set (`data_mr09a.txt`). The Legendre polynomials should be prepared by the user and saved in the data file. The following is the detail of this data.
26 | 
27 | ~~~~~{language=text caption="data_mr09a.txt"}
28 |  4   4  1 17.0  0.7071 -1.2247  1.5811 -1.8708  2.1213
29 |  4  38  2 18.6  0.7071 -0.9526  0.6442 -0.0180 -0.6205
30 |  4  72  3 24.0  0.7071 -0.6804 -0.0586  0.7571 -0.7757
31 | ...
32 | ~~~~~
33 | 
34 | 1.  Animal ID (cow)
35 | 2.  Days in milk (DIM)
36 | 3.  Herd-test-day class
37 | 4.  Test day fat yield
38 | 5.  0th order Legendre covariable $\phi_0$ (intercept)
39 | 6.  1st order Legendre covatiable $\phi_1$
40 | 7.  2nd order Legendre covariable $\phi_2$
41 | 8.  3rd order Legendre covariable $\phi_3$
42 | 9.  4th order Legendre covariable $\phi_4$
43 | 
44 | 
45 | The pedigree is the same as Example 4.1 (`pedigree_mr09a.txt`).
46 | 
47 | The parameter file is shown below.
48 | 
49 | ~~~~~{language=blupf90 caption="param_mr09a.txt"}
50 | DATAFILE
51 | data_mr09a.txt
52 | NUMBER_OF_TRAITS
53 | 1
54 | NUMBER_OF_EFFECTS
55 | 8
56 | OBSERVATION(S)
57 | 4
58 | WEIGHT(S)
59 | 
60 | EFFECTS:
61 |  3 10 cross    # HTD
62 |  5  1 cov      # Legendre polynomials (intercept)
63 |  6  1 cov      # Legendre polynomials (1st order)
64 |  7  1 cov      # Legendre polynomials (2nd order)
65 |  8  1 cov      # Legendre polynomials (3rd order)
66 |  9  1 cov      # Legendre polynomials (4th order)
67 |  1  8 cross    # for additive genetic effect
68 |  1  8 cross    # for permanent environmental effect
69 | RANDOM_RESIDUAL VALUES
70 | 3.710
71 | RANDOM_GROUP
72 | 7
73 | RANDOM_TYPE
74 | add_animal
75 | FILE
76 | pedigree_mr09a.txt
77 | (CO)VARIANCES
78 | 5.521
79 | RANDOM_GROUP
80 | 8
81 | RANDOM_TYPE
82 | diagonal
83 | FILE
84 | 
85 | (CO)VARIANCES
86 | 8.470
87 | OPTION solv_method FSPAK
88 | ~~~~~
89 | 
90 | Solutions
91 | ---------
92 | 
93 | The equations have a dependency so the solutions for fixed effects are not unique. Besides the fixed effects, the results from BLUPF90 are slightly different from the textbook.
94 | 


--------------------------------------------------------------------------------
/mrode_c10ex103_qtl.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Directly predicting the additive genetic merit with QTL
 10 | =======================================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | There is an approach to directly predict an animal’s marker genetic merit. The QTL relationship matrix ($\mathbf{G}_v$) can be reduced to a relationship matrix among animals ($\mathbf{A}_v$). The mathematical model contains fixed effects, additive polygenic effects and additive genetic effects related to the marker. The system of mixed model equations is
 16 | $$
 17 | \left[
 18 | \begin{array}{lll}
 19 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{X}'\mathbf{R}^{-1}\mathbf{Z} & \mathbf{X}'\mathbf{R}^{-1}\mathbf{W}\\
 20 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{Z}'\mathbf{R}^{-1}\mathbf{Z} + \mathbf{A}_{u}^{-1}/\sigma_u^{2} & \mathbf{Z}'\mathbf{R}^{-1}\mathbf{W}\\
 21 | \mathbf{W}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{W}'\mathbf{R}^{-1}\mathbf{Z} & \mathbf{W}'\mathbf{R}^{-1}\mathbf{W} + \mathbf{A}_{v}^{-1}/\sigma_q^{2}\\
 22 | \end{array}
 23 | \right]
 24 | \left[
 25 | \begin{array}{c}
 26 | \mathbf{\hat{b}}\\
 27 | \mathbf{\hat{u}}\\
 28 | \mathbf{\hat{q}}
 29 | \end{array}
 30 | \right]
 31 | =
 32 | \left[
 33 | \begin{array}{l}
 34 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{y} \\
 35 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{y} \\
 36 | \mathbf{W}'\mathbf{R}^{-1}\mathbf{y}
 37 | \end{array}
 38 | \right].
 39 | $$
 40 | The author assumes $\sigma_u^2 = 0.30$, $\sigma_q^2 = 0.10$, and $\sigma_e^2 = 0.60$.
 41 | 
 42 | 
 43 | Files
 44 | -----
 45 | 
 46 | We use the same data set as the previous example except for removing the paternal and maternal QTL effects (`data_mr10b.txt`). An explanation for each column is given as follows.
 47 | 
 48 | 1. Animal ID (calf)
 49 | 2. Sex (1=male and 2=female)
 50 | 3. Sire ID
 51 | 4. Dam ID
 52 | 5. Post weaning weight (kg)
 53 | 
 54 | The pedigree file is also the same as before (`pedigree_mr10b.txt`). It has the 4th column with the inb/upg code.
 55 | 
 56 | In this case, we should prepare $\mathbf{A}_{v}^{-1}$ as an user-supplied file. The following file contains its diagonal and upper-triangular elements.
 57 | 
 58 | ~~~~~{language=text caption="userinverse_mr10b.txt"}
 59 |   1 1  4.966
 60 |   1 2  0.286
 61 |   1 3 -0.148
 62 | ...
 63 |   4 4  5.978
 64 |   4 5 -2.971
 65 |   5 5  4.836
 66 | ~~~~~
 67 | 
 68 | The parameter file is as follows.
 69 | 
 70 | ~~~~~{language=blupf90 caption="param_mr10b.txt"}
 71 | DATAFILE
 72 | data_mr10b.txt
 73 | NUMBER_OF_TRAITS
 74 | 1
 75 | NUMBER_OF_EFFECTS
 76 | 3
 77 | OBSERVATION(S)
 78 | 5
 79 | WEIGHT(S)
 80 | 
 81 | EFFECTS:
 82 | 2 2 cross    # fixed effect
 83 | 1 5 cross    # additive polygenic effect
 84 | 1 5 cross    # additive QTL effect
 85 | RANDOM_RESIDUAL VALUES
 86 | 0.60
 87 | RANDOM_GROUP
 88 | 2
 89 | RANDOM_TYPE  # considering inbreeding
 90 | add_an_upginb
 91 | FILE
 92 | pedigree_mr10b.txt
 93 | (CO)VARIANCES
 94 | 0.30
 95 | RANDOM_GROUP
 96 | 3
 97 | RANDOM_TYPE  # reading user-supplied file
 98 | user_file
 99 | FILE         # its file name
100 | userinverse_mr10b.txt
101 | (CO)VARIANCES
102 | 0.10
103 | OPTION solv_method FSPAK
104 | ~~~~~
105 | 
106 | Solutions
107 | ---------
108 | 
109 | You can confirm the solutions are identical to the textbook.
110 | 


--------------------------------------------------------------------------------
/largescale_issues.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Large-scale genetic evaluation
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Issues in a large scale analysis
10 | ================================
11 | 
12 | Recent computers can finish genetic evaluations or variance component estimation within a practical time although we use a relatively large data set and a complicated model which weren't handled in the past. Even with modern computers, a genetic evaluation is a challenge both in computing time and required memory. BLUPF90, AIREMLF90, and GIBBSF90 programs are designed to store the system of mixed model equations (MME) in memory and they could be capable of 1 million pedigree animals in a single-trait model. This is not enough with a large-scale analysis including more animals and effects.
13 | 
14 | There are several issues to be solved in large-scale genetic evaluation.
15 | 
16 | - The solution of the mixed model equations.
17 | - Calculation of accuracy or reliability of individual EBV.
18 | - Estimation of variance components.
19 | 
20 | The mixed model equations will be too large to be stored in memory in the large analysis. So we need a trick to solve the equations without explicit creation of the equations. In this case, we cannot directly calculate the inverse of the left-hand side of the equations, and the prediction error variance of EBV of an animal is not available. Even if the mixed model equations are small enough to fit the memory, the computational cost for the inverse is extremely high, and therefore, this calculation is often impractical in many cases. We usually use subsets of the whole data for variance component estimation. But still, the computation takes a really long time and never finishes if the data is large or the model is complicated.
21 | 
22 | We have several options for large-scale analysis. The BLUP90IOD2 program supports the iteration on data technique. This technique is combined with an iterative method for solving the equations. The algorithm partially builds the equations during reading the data and pedigree file and indirectly update the solutions. When we have read through the files, we have also completed one round of iteration. BLUP90IOD2 implements Preconditioned Conjugate Gradient (PCG) as the iterative method. See Tsuruta et al. (2001) for details.
23 | 
24 | The ACCF90 program approximates the accuracy or reliability of EBV for an animal. This program creates approximated elements in the left-hand side matrix for each animal. After collecting all the information, the program inverts the elements to obtain the approximated prediction-error variance (PEV). This is an iterative method but the convergence will be met quickly. The basic idea is from Misztal and Wiggans (1988) and Strabel et al. (2001).
25 | 
26 | AI REML is a primary choice of methods for variance component estimation. A typical computation in AI REML contains the inversion of the left-hand side of mixed model equations. The FSPAK package quickly calculates selected elements of the inverse needed for AI REML. It can still do a good job for a small-scale analysis, but it is out of business for the large-scale analysis. The package was written more than 20 years ago and its fundamental routine was written in the early 1980s. The new package, YAMS (Yet Another MME Solver), could remove the bottleneck. Although the results from YAMS are compatible with the previous software, YAMS supports parallel computing or other modern techniques that is highly efficient computations. Now AIREMLF90 and REMLF90 can use YAMS with an option. See Masuda et al. (2014) and Masuda et al. (2015) for details.
27 | 


--------------------------------------------------------------------------------
/mrode_c05ex053_mt_unequal_design.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Multiple-trait model with unequal design matrix
 10 | ===============================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | When each trait has a different model, the design matrices are unequal. Even in this case, the matrix notation of the model and the mixed model equations are the same described as before.
 16 | 
 17 | BLUPF90 can handle a multiple-trait model with unequal design matrices in 2 ways. Both approaches provide the same results so the difference is only from a user's preference [^1].
 18 | 
 19 | [^1]: Gibbs sampling programs accept only 1 way out of 2 ways.
 20 | 
 21 | Files
 22 | -----
 23 | 
 24 | The data are shown in the textbook at p.81, as a file `data_mr05c.txt`. This contains 2 different class effects and each trait takes an alternative one.
 25 | 
 26 | 1. Animal ID (Cow)
 27 | 2. Sire ID
 28 | 3. Dam ID
 29 | 4. HYS 1 (for Fat yield 1)
 30 | 5. HYS 2 (for Fat yield 2)
 31 | 6. Fat yield 1
 32 | 7. Fat yield 2
 33 | 
 34 | The pedigree (`pedigree_mr05c.txt`) is from the previous example.
 35 | 
 36 | First, we show a parameter file to handle the unequal design matrices in one way.
 37 | 
 38 | ~~~~~{language=blupf90 caption="param_mr05c.txt"}
 39 | DATAFILE
 40 | data_mr05c.txt
 41 | NUMBER_OF_TRAITS
 42 | 2
 43 | NUMBER_OF_EFFECTS
 44 | 2
 45 | OBSERVATION(S)
 46 | 6 7
 47 | WEIGHT(S)
 48 | 
 49 | EFFECTS:
 50 | 4 5 2 cross
 51 | 1 1 8 cross
 52 | RANDOM_RESIDUAL VALUES
 53 | 65.0 27.0
 54 | 27.0 70.0
 55 | RANDOM_GROUP
 56 | 2
 57 | RANDOM_TYPE
 58 | add_animal
 59 | FILE
 60 | pedigree_mr05c.txt
 61 | (CO)VARIANCES
 62 | 35.0 28.0
 63 | 28.0 30.0
 64 | OPTION solv_method FSPAK
 65 | ~~~~~
 66 | 
 67 | Look at the first line in `EFFECT:` which refers to 2 different columns; column 4 for trait 1 and column 5 for trait 2. So you can put different effects together into the single statement. You should put the maximum number of levels among effects enumerated in this statement. For example, in this case, the maximum level in column 4 is 2 and the maximum number of level in column 5 is also 2 --- so you can put 2 as the representative number of levels.
 68 | 
 69 | 
 70 | Solutions
 71 | ---------
 72 | 
 73 | This is identical to the textbook shown in p.82.
 74 | 
 75 | 
 76 | Another parameter file
 77 | ----------------------
 78 | 
 79 | BLUPF90 supports a different way to handle unequal design matrices. Consult the following parameter file with the same data and pedigree files.
 80 | 
 81 | ~~~~~{language=blupf90 caption="param_mr05c1.txt"}
 82 | DATAFILE
 83 | data_mr05c.txt
 84 | NUMBER_OF_TRAITS
 85 | 2
 86 | NUMBER_OF_EFFECTS
 87 | 3
 88 | OBSERVATION(S)
 89 | 6 7
 90 | WEIGHT(S)
 91 | 
 92 | EFFECTS:
 93 | 4 0 2 cross
 94 | 0 5 2 cross
 95 | 1 1 8 cross
 96 | RANDOM_RESIDUAL VALUES
 97 | 65.0 27.0
 98 | 27.0 70.0
 99 | RANDOM_GROUP
100 | 3
101 | RANDOM_TYPE
102 | add_animal
103 | FILE
104 | pedigree_mr05c.txt
105 | (CO)VARIANCES
106 | 35.0 28.0
107 | 28.0 30.0
108 | OPTION solv_method FSPAK
109 | ~~~~~
110 | 
111 | In this parameter file, the fixed effect for each trait is separately defined. The column number will be 0 for a trait which does not need the effect. In this case, the first line describes the effect $HYS_1$ and only the first trait needs this effect (so you should put 0 to the second trait). Note that the Gibbs sampling programs (including GIBBS2F90 and THRGIBBS1F90) accept only this style.
112 | 
113 | You can immediately find the results are identical to the previous ones. Some effects ($HYS_1$ for trait 2 and $HYS_2$ for trait 1) are not estimated because they are not defined in the parameter file.
114 | 


--------------------------------------------------------------------------------
/renum_norenum.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Data preparation with RENUMF90
 3 | author: Andres Legarra
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | 
10 | What if I do not want to use RENUMF90?
11 | =====================================
12 | 
13 | It happens that the user does not want to use RENUMF90. Common reasons for this are:
14 | 
15 |   - It is simulated data and it is already renumbered
16 |   - I do my own renumbering and I want to use it
17 |   - I have a complex model and I want better control on the recorded numbers
18 | 
19 | In the next section, we will give instructions on how the files should be coded. We will *not* give software but you may take a look [here](https://github.com/alegarra/yarp).
20 | 
21 | Main requirements of recoding for BLUPF90 programs
22 | ---------------------------------------------
23 | 
24 |   - Levels of effects should be coded with consecutive numbers. The order may be "natural" or not. For instance, if years in data are (2010, 2011, 2013, 2014) may be renumbered as (1,2,3,4) or (2,1,3,4).
25 |   - The animal is another effect, so it has receive codes from 1 to the number of animals _in the pedigree file_ (this is usually greater than the number of animals in the data file).
26 |   - The animal effect does not need to be renumbered in consecutive order (parents before offspring), but this order is a correct one and may be used. So, for instance, this renumbering is correct:
27 | 
28 | ~~~{language=text caption="ped_kempthorne"}
29 |   A 0 0
30 |   B 0 0
31 |   D A B
32 |   E A D
33 |   F B E
34 |   Z A B
35 | ~~~
36 | 
37 | could be recoded as
38 | 
39 | ~~~{language=text caption="ped_kempthorne_recoded"}
40 | 1 0 0 A 0 0
41 | 2 0 0 B 0 0
42 | 3 1 2 D A B
43 | 4 1 2 Z A B
44 | 5 1 3 E A D
45 | 6 2 5 F B E
46 | ~~~
47 | 
48 | (the last three columns are not needed but help the visualization).
49 | 
50 | 
51 | Unknown parent groups
52 | ---------------------
53 | 
54 | However, if unknown parent groups (UPGs) are in the pedigree, things get more complicated. Assume that we have $n$ _real_ animals and $m$ UPGs. Animals have to be renumbered with codes from 1 to $n$, and UPGs have to be renumbered with codes from $n+1$ to $n+m$. In addition, because they do not have ancestors, the UPGs must not have a line on their own in the pedigree file for BLUPF90. In that way, BLUPF90 "knows" that a given number is a UPG and not an individual. For instance, in the previous example assume that UPGs are "unknown2000" and "unknown2004":
55 | 
56 | ~~~{language=text caption="ped_kempthorne_unknown parent groups"}
57 |   A unknown2000 unknown2000
58 |   B unknown2000 unknown2004
59 |   D A B
60 |   E A D
61 |   F B E
62 |   Z A B
63 | ~~~
64 | 
65 | It should be recoded as follows.
66 | 
67 | ~~~{language=text caption="ped_kempthorne_unknown parent groups_recoded"}
68 | 1 7 7 A unknown2000 unknown2000
69 | 2 7 8 B unknown2000 unknown2004
70 | 3 1 2 D A B
71 | 4 1 3 E A D
72 | 5 2 4 F B E
73 | 6 1 2 Z A B
74 | ~~~
75 | 
76 | Genotypes
77 | ---------
78 | 
79 | Genotypes should be coded as 0/1/2 for {AA, Aa, aa} and 5 for missing as described elsewhere in this tutorial, in a text format with fixed file. The first column of this file contains (possibly alphanumeric) identifiers for animals, for instance:
80 | 
81 | ~~~{language=text caption="genotype_file"}
82 |    A 120120202102111
83 |    Z 121111202111010
84 | ~~~
85 | 
86 | The only important requirement is the creation of the cross-reference file (usually called `_XrefID`). This file contains the original ID and the new ID in the renumbered pedigree and data files. In the previous example it would be:
87 | 
88 |  ~~~{language=text caption="genotype_file_XrefID"}
89 |   1  A
90 |   6  Z
91 |  ~~~
92 | 
93 | Overall mean
94 | ------------
95 | 
96 | BLUPF90 programs do _not_ include by default an overall mean !! If you need one, add it by yourself as a column of 1s in the data file
97 | 


--------------------------------------------------------------------------------
/mrode_c05ex054_mt_no_covariance.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Multiple-trait model with no environmental covariance
 10 | =====================================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | Assuming a 2-trait model, there is a situation where some animals are recorded only in one trait and the other animals are recorded only in another trait. In other words, no animals have observations for both traits. This situation also happens when evaluating genotype by environment (G by E) effects. Animals in the environment 1 do not have records in environment 2, and vice versa. In this case, we should assume that the residual covariance is exactly zero.
 16 | 
 17 | The author assumes dual-purpose sires in cattle. The male and female calves are raised in different feeding systems. Male calves are recorded for yearling weight and females for fat yield. The genetic covariance is nonzero but the residual covariance is zero. Note that the results from BLUPF90 must be different from the values in the textbook.
 18 | 
 19 | 
 20 | Files
 21 | -----
 22 | 
 23 | The data set (`data_mr05d.txt`) is shown in the textbook at p.85. Animal 4 can be omitted in the data file because it has no observations.
 24 | 
 25 | 1. Animal ID (Calf)
 26 | 2. Sex (1=male and 2=female)
 27 | 3. Sire ID
 28 | 4. Dam ID
 29 | 5. HYS
 30 | 6. Yearling weight (kg)
 31 | 7. Fat yield (kg)
 32 | 
 33 | The pedigree is from the above data set.
 34 | 
 35 | The parameter file should be the following.
 36 | 
 37 | ~~~~~{language=blupf90 caption="param_mr05d.txt"}
 38 | DATAFILE
 39 | data_mr05d.txt
 40 | NUMBER_OF_TRAITS
 41 | 2
 42 | NUMBER_OF_EFFECTS
 43 | 2
 44 | OBSERVATION(S)
 45 | 6 7
 46 | WEIGHT(S)
 47 | 
 48 | EFFECTS:
 49 | 5 5 3 cross
 50 | 1 1 17 cross
 51 | RANDOM_RESIDUAL VALUES
 52 | 77.0  0.0
 53 |  0.0 70.0
 54 | RANDOM_GROUP
 55 | 2
 56 | RANDOM_TYPE
 57 | add_animal
 58 | FILE
 59 | pedigree_mr05d.txt
 60 | (CO)VARIANCES
 61 | 43.0 18.0
 62 | 18.0 30.0
 63 | OPTION solv_method FSPAK
 64 | ~~~~~
 65 | 
 66 | Solutions
 67 | ---------
 68 | 
 69 | The solutions are different from the textbook.
 70 | 
 71 | ~~~~~{language=text caption="solutions"}
 72 | trait/effect level  solution
 73 |    1   1         1        412.26462367
 74 |    2   1         1        194.02892921
 75 |    1   1         2        276.21351695
 76 |    2   1         2        204.76619557
 77 |    1   1         3          0.00000000
 78 |    2   1         3        161.66294167
 79 |    1   2         1         -3.36497774
 80 |    2   2         1          1.25823921
 81 |    1   2         2         -1.48909004
 82 |    2   2         2          3.77372023
 83 |    1   2         3          4.23664594
 84 |    2   2         3         -1.68697171
 85 |    1   2         4         -6.93946803
 86 |    2   2         4         -1.57147632
 87 |    1   2         5         -5.01220042
 88 |    2   2         5         -2.09813041
 89 |    1   2         6          5.01220042
 90 |    2   2         6          2.09813041
 91 |    1   2         7          2.13678047
 92 |    2   2         7          3.56130079
 93 |    1   2         8         -4.27356095
 94 |    2   2         8         -7.12260158
 95 |    1   2         9        -12.16152864
 96 |    2   2         9         -3.09074655
 97 |    1   2        10         -8.26284566
 98 |    2   2        10         -1.26033550
 99 |    1   2        11          5.83581177
100 |    2   2        11          3.77631522
101 |    1   2        12         12.63228129
102 |    2   2        12          3.55770600
103 |    1   2        13          1.52268184
104 |    2   2        13          5.97107079
105 |    1   2        14         -4.29201845
106 |    2   2        14        -11.52738822
107 |    1   2        15         -1.87022083
108 |    2   2        15          0.01073376
109 |    1   2        16          4.29035684
110 |    2   2        16         11.99499708
111 |    1   2        17          2.68411368
112 |    2   2        17          1.66338290
113 | ~~~~~
114 | 


--------------------------------------------------------------------------------
/mrode_c04ex041_repeatability_model.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Numerical examples from Mrode (2014)
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Repeatability model
10 | ===================
11 | 
12 | Model
13 | -----
14 | 
15 | A repeatability model is an animal model applicable when an animal has more than 1 observations. In such a case, we can split the residual effect into 2: "true" temporary environmental effect and a permanent environmental effect which belongs to the animal. So the model contains 2 kinds of animal effects (additive genetic and permanent environmental effects).
16 | 
17 | The author assumes the repeatability model in Example 4.1 for fat yield in dairy cattle. The model has the fixed effects for parity and Herd-Year-Season, the additive genetic effect for the same animal, the permanent environmental effect for an animal, and the random residual effect. The author assumes the genetic variance is 20, the residual variance is 28, and the permanent-environmental variance is 12.
18 | 
19 | Files
20 | -----
21 | 
22 | Data file (`data_mr04a.txt`) just contains the table shown at p.63 in the textbook. This file has 6 columns. Column 2 and 3 are not actually used in this analysis.
23 | 
24 | 1. Animal ID (cow)
25 | 2. Sire ID
26 | 3. Dam ID
27 | 4. Parity (1 or 2)
28 | 5. HYS (from 1 to 4)
29 | 6. Fat yield (kg)
30 | 
31 | The pedigree file (`pedigree_mr04a.txt`) can be prepared as an usual, 3-column file.
32 | 
33 | This model contains 2 random effects besides the random error. The permanent environmental effects are not related to each other so you should use `diagonal` in the `RANDOM_TYPE` keyword. The position of permanent environmental effects is, in this case, the same as the additive genetic effects, because both effects are animal's individual effects - the only difference is the covariance structure. The parameter file is as follows.
34 | 
35 | ~~~~~{language=blupf90 caption="param_mr04a.txt"}
36 | DATAFILE
37 | data_mr04a.txt
38 | NUMBER_OF_TRAITS
39 | 1
40 | NUMBER_OF_EFFECTS
41 | 4
42 | OBSERVATION(S)
43 | 6
44 | WEIGHT(S)
45 | 
46 | EFFECTS:
47 | 5 4 cross
48 | 4 2 cross
49 | 1 8 cross  # for additive genetic effect
50 | 1 8 cross  # for permanent environmental effect
51 | RANDOM_RESIDUAL VALUES
52 | 28.0
53 | RANDOM_GROUP   # additive genetic effect
54 | 3
55 | RANDOM_TYPE
56 | add_animal
57 | FILE
58 | pedigree_mr04a.txt
59 | (CO)VARIANCES
60 | 20.0
61 | RANDOM_GROUP  # permanent environmental effect
62 | 4
63 | RANDOM_TYPE
64 | diagonal
65 | FILE
66 | 
67 | (CO)VARIANCES
68 | 12.0
69 | OPTION solv_method FSPAK
70 | ~~~~~
71 | 
72 | Solutions
73 | ---------
74 | 
75 | The BLUPF90 calculates the solutions (not shown here) with the above parameter file. The left-hand side of the mixed model equations is not full rank and 2 solutions for fixed effects are replaced with 0. The positions of the zero-constraints are different from the textbook (p.64; Parity 1 and 2 here vs HYS 1 and 3 in the textbook). As before, a linear contrast is the same in both cases. BLUP for random effects is the same as the textbook. Solutions for PE for animal 1 to 3 are 0 (the expected value as a random effect with no information) because they have no observations. In this case, column 1 was shared for additive genetic and permanent environmental effects.
76 | 
77 | 
78 | Putting arbitrary zero-constraints manually
79 | -------------------------------------------
80 | 
81 | There is a trick to manually put user-defined zero-constraints on the mixed model equations. You can replace the target effect code in the data file with 0. This technique has been introduced in the previous section for unknown parent groups. The alternative data file (`data_mr04a1.txt`) provides the same solutions shown in the text file.
82 | 
83 | In the textbook, the author puts constraints on HYS 1 and 3. The 5th column in the above file has 0 for HYS 1 and 3. You can rewrite the parameter file to read the new data file; just change the `DATAFILE` block. Run BLUPF90 with the modified parameter file, and you can see different solutions.
84 | 
85 | Be careful if you use the manual constraints. If you put too many constraints on the equation, the solutions will make no sense.
86 | 


--------------------------------------------------------------------------------
/introduction_condition.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Introduction
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | About the software
10 | ==================
11 | 
12 | Condition of use
13 | ----------------
14 | 
15 | For detailed information about the conditions of use and support, please visit the official wiki (<http://nce.ads.uga.edu/wiki/doku.php>). The programs are free for research use, but publications should acknowledge the use of the programs. Please contact the UGA faculty (see <http://nce.ads.uga.edu/> for contact information).
16 | 
17 | Bug report and support
18 | ----------------------
19 | 
20 | Bug reports are welcome. Please contact one of the developers at UGA or other institutes, or post your report on Groups.io (<https://groups.io/g/blupf90>). Support may be limited on Groups.io because it is provided on a volunteer basis. Note that we have moved from Yahoo Groups to Groups.io due to changes in Yahoo's policy. If you need full support for the programs, please consider visiting our team.
21 | 
22 | Citations
23 | ---------
24 | 
25 | You can cite the following publication when you use the BLUPF90 programs in your study. Please note that this manual is occasionally updated, but you can cite the version published in 2014.
26 | 
27 | * Misztal, I., S. Tsuruta, D. A. L. Lourenco, Y. Masuda, I. Aguilar, A. Legarra, Z. Vitezica. 2014. Manual for BLUPF90 family programs. University of Georgia. <http://nce.ads.uga.edu/wiki/doku.php?id=documentation> (This is the standard reference usually cited by users.)
28 | 
29 | If you cite this tutorial in your publication, please use the following reference *in addition to the manual listed above*. Please also note that you may cite the original version of the tutorial published in 2018, although it has been revised several times in the past and may be updated again in the future.
30 | 
31 | * Masuda, Y. 2018. Introduction to BLUPF90 suite programs. University of Georgia. <http://nce.ads.uga.edu/wiki/doku.php?id=documentation>
32 | 
33 | If you need to cite a particular feature of the BLUPF90 programs, the following references may be useful. Please be aware that new articles on this topic may be published in the future, and you are encouraged to cite the most recent one in your publications.
34 | 
35 | * Lourenco, D. A. L., Tsuruta, S., Masuda, Y., Bermann, M., Legarra, A., Misztal, I. Recent updates in the BLUPF90 software suite. In: Proceedings of the 12th World Congress on Genetics Applied to Livestock Production; 2022 July 3-8; Rotterdam.
36 | 
37 | * Lourenco, D. A. L., Legarra, A., Tsuruta, S., Masuda, Y., Aguilar, A., Misztal, I. 2020. SingleStep Genomic Evaluations from Theory to Practice: Using SNP Chips and Sequence Data in BLUPF90. Genes, 11:790. <https://doi.org/10.3390/genes11070790>
38 | 
39 | The following references are older, but they may still be useful for referring to specific features of the program.
40 | 
41 | * Aguilar, I., S. Tsuruta, Y. Masuda, D. A. L. Lourenco, A. Legarra, I. Misztal. 2018. BLUPF90 suite of programs for animal breeding with focus on genomics. No. 11.751. The 11th World Congress of Genetics Applied to Livestock Production, Auckland, New Zealand. (This is the most recent reference for BLUPF90 programs focusing on genomic analysis.)
42 | 
43 | * Aguilar, I., I. Misztal, S. Tsuruta, A. Legarra, H. Wang. 2014. PREGSF90 - POSTGSF90: Computational Tools for the Implementation of Single-step Genomic Selection and Genome-wide Association with Ungenotyped Individuals in BLUPF90 Programs. The 10th World Congress of Genetics Applied to Livestock Production, Vancouver, BC, Canada. (This can be cited if you used a combination of PREGSF90 and POSTGSF90 to perform single-step GWAS.)
44 | 
45 | * Misztal, I., S. Tsuruta, T. Strabel, B. Auvray, T. Druet, D. H. Lee. 2002. BLUPF90 and related programs (BGF90). Communication No. 28-07. The 7th World Congress on Genetics Applied to Livestock Production, August 19–23, 2002, Montpellier, France. (This was widely cited in the past, but is now considered outdated. Use this reference only if you have a specific reason.)
46 | 
47 | The following reference is cited in the technical descriptions in the subsequent chapters:
48 | 
49 | * Misztal, I. 2024. Computational techniques in animal breeding. Course Note. University of Georgia.
50 | 


--------------------------------------------------------------------------------
/installation_linux.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Download and Installation
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Running a program in Linux and macOS
10 | ====================================
11 | 
12 | Command line software
13 | ---------------------
14 | 
15 | BLUPF90 programs are command-line software. Each program reads a parameter file describing the names of the data and pedigree files, the model, and the covariance components used in the analysis. It displays logs on the screen and writes one or more output files. These programs are designed to be used in a shell environment with keyboard input.
16 | 
17 | Here, we assume a standard bash environment. For macOS, all operations will be performed in Terminal. No graphical user interface (GUI) is used.
18 | 
19 | **Note:** Currently, only Intel-based Macs are supported. The recent Macs with Apple M-series (ARM-based) CPUs are not supported.
20 | 
21 | Installation
22 | ------------
23 | 
24 | No special installation procedure is required. Just download the program from the official website. Then, change the file permissions using `chmod`, for example, `chmod 755` or `chmod u+x` (a helpful way to remember: _ch_ange the _mod_e to add execution permission _+x_ for the _u_ser). You can move the program to your preferred directory, which may already be listed in your `PATH` environment variable. If not, you can create such a directory and move the program there.
25 | 
26 | Here is an example:
27 | 
28 | ~~~~~{language=shell}
29 | # 1. Download the program. You can also use curl or another method.
30 | wget http://nce.ads.uga.edu/html/projects/programs/Mac_OSX/new/blupf90
31 | 
32 | # 2. Change the permission to make it executable.
33 | chmod 755 blupf90
34 | 
35 | # 3. Create a directory for binaries, if it doesn't exist.
36 | mkdir ~/bin
37 | 
38 | # 4. Move the program to that directory.
39 | mv blupf90 ~/bin
40 | ~~~~~
41 | 
42 | You can use other tools such as `curl` instead of `wget` to download the file. If the directory (e.g., `~/bin`) is not listed in your `PATH` environment variable, you need to add it. Check the current `PATH` using:
43 | 
44 | ~~~~~{language=shell}
45 | echo $PATH
46 | ~~~~~
47 | 
48 | If your directory is listed, you’re all set. If not, add it by typing:
49 | 
50 | ~~~~~{language=shell}
51 | export PATH=~/bin:$PATH
52 | ~~~~~
53 | 
54 | If the program is located in a different directory, replace `~/bin` with the correct path. This setting is temporary and will disappear when you log out. To make it permanent, add the command to your `.bash_profile`. If you're unsure how to do this, open `.bash_profile` with a text editor and add the line to the end. Log out and log back in for the change to take effect.
55 | 
56 | Running a program
57 | -----------------
58 | 
59 | Navigate to the directory that contains all the required files (e.g., pedigree, data). Then, type the program name to start it. If the program launches successfully, you will see a message like:
60 | 
61 | ~~~~~{language=shell}
62 | name of parameter file?
63 | ~~~~~
64 | 
65 | At this prompt, type the name of the parameter file. Some programs, especially Gibbs sampling ones, will require additional inputs. The messages will appear on screen (standard output), and you can save them to a file using redirection:
66 | 
67 | ~~~~~{language=shell}
68 | # Save all messages to a file.
69 | # The program still accepts input even if nothing is shown.
70 | blupf90 > out.txt
71 | 
72 | # Save messages to a file while also displaying them on screen.
73 | blupf90 | tee out.txt
74 | ~~~~~
75 | 
76 | You can also avoid manual input at the start of the program:
77 | 
78 | ~~~~~{language=shell}
79 | # Provide the parameter file name via echo.
80 | echo parameter.txt | blupf90
81 | 
82 | # Or use a file.
83 | echo parameter.txt > input
84 | blupf90 < input
85 | ~~~~~
86 | 
87 | The second method is especially useful for Gibbs sampling programs that require several lines of input.
88 | 
89 | Stop a program
90 | --------------
91 | 
92 | To stop the program immediately, press `Ctrl + C`. For convenience, you can hold down `Ctrl` and tap `C`. The program will terminate immediately, and any resulting messages can be safely ignored. In some cases, especially during multi-threaded computations, the program may continue running after receiving the stop signal. If that happens, simply close the terminal window to stop the program completely.
93 | 


--------------------------------------------------------------------------------
/references.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: References
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | - Chen, C.Y., Misztal, I., Aguilar, I., Legarra, A. and Muir, W.M., 2011. Effect of different genomic relationship matrices on accuracy and scale. Journal of Animal Science, 89(9), pp.2673-2679.
10 | - Christensen, OF, P. Madsen, B. Nielsen, T. Ostersen, and G. Su. 2012. Single-Step Methods for Genomic Evaluation in Pigs. Animal 6: 1565–71.
11 | - Foulley, J.L., Gianola, D. and Thompson, R., 1983. Prediction of genetic merit from data on binary and quantitative variates with an application to calving difficulty, birth weight and pelvic opening. Genetics Selection Evolution, 15(3), p.1.
12 | - Hoeschele, I., Tier, B. and Graser, H.U., 1995. Multiple-trait genetic evaluation for one polychotomous trait and several continuous traits with missing data and unequal models. Journal of animal science, 73(6), pp.1609-1627.
13 | - Janss, L.L.G. and Foulley, J.L., 1993. Bivariate analysis for one continuous and one threshold dichotomous trait with unequal design matrices and an application to birth weight and calving difficulty. Livestock Production Science, 33(3), pp.183-198.
14 | - Legarra, A., Bertrand, J.K., Strabel, T., Sapp, R.L., Sanchez, J.P. and Misztal, I., 2007. Multi-breed genetic evaluation in a Gelbvieh population. Journal of Animal Breeding and Genetics, 124(5), pp.286-295.
15 | - Lourenco, D.A.L., Tsuruta, S., Fragomeni, B.O., Masuda, Y., Aguilar, I., Legarra, A., Bertrand, J.K., Amen, T.S., Wang, L., Moser, D.W. and Misztal, I., 2015. Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus. Journal of Animal Science, 93(6), pp.2653-2662.
16 | - Masuda, Y., Aguilar, I., Tsuruta, S. and Misztal, I., 2015. Technical note: Acceleration of sparse operations for average-information REML analyses with supernodal methods and sparse-storage refinements. Journal of Animal Science, 93(10), pp.4670-4674.
17 | - Masuda, Y., Baba, T. and Suzuki, M., 2014. Application of supernodal sparse factorization and inversion to the estimation of (co) variance components by residual maximum likelihood. Journal of Animal Breeding and Genetics, 131(3), pp.227-236.
18 | - Misztal, I., Tsuruta, S., Aguilar, I., Legarra, A., VanRaden, P.M. and Lawlor, T.J., 2013. Methods to approximate reliabilities in single-step genomic evaluation. Journal of dairy science, 96(1), pp.647-654.
19 | - Misztal, I. and Wiggans, G.R., 1988. Approximation of prediction error variance in large-scale animal models. Journal of Dairy Science, 71, pp.27-32.
20 | - Misztal, I. 2008. Reliable computing in estimation of variance components. Journal of Animal Breeding and Genetics, 125(6), pp.363-370.
21 | - Mrode, R. A. 2014. *Linear Models for the Prediction of Animal Breeding Values*. Third Edition. CAB International, Wallingford, Oxon, UK.
22 | - Powell, J.E., Visscher, P.M. and Goddard, M.E., 2010. Reconciling the analysis of IBD and IBS in complex trait studies. Nature Reviews Genetics, 11(11), pp.800-805.
23 | - Sanchez, J.P., Misztal, I., Aguilar, I. and Bertrand, J.K., 2008. Genetic evaluation of growth in a multibreed beef cattle population using random regression-linear spline models. Journal of animal science, 86(2), pp.267-277.
24 | - Sorensen, D.A., Andersen, S., Gianola, D. and Korsgaard, I., 1995. Bayesian inference in threshold models using Gibbs sampling. Genetics Selection Evolution, 27(3), p.1.
25 | - Strabel, T., Misztal, I. and Bertrand, J.K., 2001. Approximation of reliabilities for multiple-trait model with maternal effects. Journal of Animal Science, 79(4), pp.833-839.
26 | - Stranden, I. and Lidauer, M., 1999. Solving large mixed linear models using preconditioned conjugate gradient iteration. Journal of dairy science, 82(12), pp.2779-2787.
27 | - Tsuruta, S., Misztal, I. and Stranden, I., 2001. Use of the preconditioned conjugate gradient algorithm as a generic solver for mixed-model equations in animal breeding applications. Journal of Animal Science, 79(5), pp.1166-1172.
28 | - Vitezica, Z.G., Aguilar, I., Misztal, I. and Legarra, A., 2011. Bias in genomic predictions for populations under selection. Genetics Research, 93(05), pp.357-366.
29 | - Wang, H., Misztal, I., Aguilar, I., Legarra, A. and Muir, W.M., 2012. Genome-wide association mapping including phenotypes from relatives without genotypes. Genetics Research, 94(02), pp.73-83.
30 | - Wang, C.S., Quaas, R.L. and Pollak, E.J., 1997. Bayesian analysis of calving ease scores and birth weights. Genetics Selection Evolution, 29(2), p.1.
31 | 


--------------------------------------------------------------------------------
/installation_env.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Download and Installation
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Additional settings
10 | ===================
11 | 
12 | The BLUPF90 programs may use a large amount of memory, especially when the dataset is large and a genomic model is applied. Heavy memory consumption often leads to a program crash with a *segmentation fault* (or *bus error*). A segmentation fault occurs when a program attempts to access memory in an inappropriate way or location. Note that this error can happen even if your computer has a lot of free memory. This is because the operating system (Linux, macOS, or Windows) imposes limitations on the memory usage of each program. More specifically, the system limits the amount of memory called the *stack*, and our programs use a large stack memory to achieve maximum performance.
13 | 
14 | This is a common issue with large datasets. You may encounter it when updating to a newer version of the program while using the same input files. Even a small code change can affect memory usage behavior.
15 | 
16 | You can either remove the limitation or increase the stack size. The following settings are recommended even if you haven't encountered this problem yet. More details are available on our website: <http://nce.ads.uga.edu/wiki/doku.php?id=faq.segfault>.
17 | 
18 | Increase stack size
19 | -------------------
20 | 
21 | This setting is required on Linux and macOS. First, type the following command in your shell (terminal):
22 | 
23 | ~~~~~{language=shell}
24 | ulimit
25 | ~~~~~
26 | 
27 | If it shows `unlimited`, the configuration is fine. If you see a number (like `8192`), it is likely to be a problem. This number represents the stack size and it should be *unlimited*. To change the stack size, type the following command before running our programs:
28 | 
29 | ~~~~~{language=shell}
30 | ulimit -s unlimited
31 | ~~~~~
32 | 
33 | You can also add this command at the end of your `.bash_profile` in your home directory. This way, the setting will be applied automatically every time you log in.
34 | 
35 | Increase stack size for OpenMP
36 | ------------------------------
37 | 
38 | There is a separate setting to define the stack size for OpenMP. This value is independent of the system stack size explained above. You can set it using an environment variable.
39 | 
40 | ### Linux and macOS
41 | 
42 | To check the current value, type the following command in your shell:
43 | 
44 | ~~~~~{language=shell}
45 | echo $OMP_STACKSIZE
46 | ~~~~~
47 | 
48 | If it shows nothing, or a small value like `4M`, it might be too small. The default value is usually `4M`, which is likely insufficient. Before running the program, set it using the following command (no spaces around `=`):
49 | 
50 | ~~~~~{language=shell}
51 | export OMP_STACKSIZE=64M
52 | ~~~~~
53 | 
54 | This sets the stack size to 64 megabytes. If the program still fails with the same error, increase the size gradually (e.g., to 128M, 192M, etc.). However, a large value can lead to high memory usage, because each thread can allocate this much stack. There is no universally appropriate value — it depends on your system, so adjust empirically.
55 | 
56 | For a one-time setting, you can add the variable inline when you run the program (no `export`):
57 | 
58 | ~~~~~{language=shell}
59 | OMP_STACKSIZE=64M ./blupf90
60 | ~~~~~
61 | 
62 | As with `ulimit`, you can save this setting permanently in your `.bash_profile`.
63 | 
64 | ### Windows
65 | 
66 | On Windows, you can set the `OMP_STACKSIZE` environment variable either temporarily via the Command Prompt or permanently via the system settings in Control Panel.
67 | 
68 | To check the current value, type:
69 | 
70 | ~~~~~{language=shell}
71 | set %OMP_STACKSIZE%
72 | ~~~~~
73 | 
74 | To set a value (e.g., 64 megabytes), type:
75 | 
76 | ~~~~~{language=shell}
77 | set OMP_STACKSIZE=64M
78 | ~~~~~
79 | 
80 | This setting lasts only for the current session, so you must run the BLUPF90 programs in the same session after setting it. To make it permanent, use the Control Panel. For instructions, search for keywords like `windows environment variable` using a search engine.
81 | 
82 | The value should include a number and unit, such as `64M` for 64 megabytes. Start with `64M`, and if the problem persists, increase the value (e.g., `128M` or more). As with other systems, a value that is too large will result in high memory consumption because each thread may use that much memory. You will need to determine an appropriate value based on your computer's resources.
83 | 


--------------------------------------------------------------------------------
/introduction_short.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Introduction
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Short introduction to BLUPF90 programs
10 | ======================================
11 | 
12 | What is BLUPF90?
13 | ----------------
14 | 
15 | BLUPF90 is the name of the software. It is also the name of a collection of programs derived from BLUPF90. In the latter case, we will refer to it as the *BLUPF90 family* or *BLUPF90 programs*. A concise description can also be found on the official wiki at the University of Georgia (<http://nce.ads.uga.edu/wiki/doku.php>).
16 | 
17 | >    BLUPF90 family of programs is a collection of software in Fortran 90/95 for mixed  
18 | >    model computations in animal breeding. The goal of the software is to be as simple as  
19 | >    with a matrix package and as efficient as in a programming language.
20 | 
21 | The BLUPF90 program creates and solves mixed model equations. It supports various models including the animal model, maternal model, and random regression model with multiple traits. The BLUPF90 family includes several programs for variance component estimation using REML and Gibbs sampling with various models. The set of programs is available on the official website (<http://nce.ads.uga.edu>) and can be freely used for academic or research purposes.
22 | 
23 | Why BLUPF90?
24 | ------------
25 | 
26 | BLUPF90 has several advantages over similar software for users: simplicity, stability, and active development. For programmers, the internal structure is documented in a course note (<https://nce.ads.uga.edu/wiki/doku.php?id=courses>), and the working source code is available, so a developer can modify the program to support new ideas.
27 | 
28 | We will now look at its advantages from a user's point of view.
29 | 
30 | ### Simplicity ###
31 | 
32 | The program's behavior is very simple. Every BLUPF90 program reads a parameter file, which describes the names of data and pedigree files, models, and (initial) variance components to be used in the analysis. The parameter file is a short text file containing a few pairs of keywords and values to describe the information. It is concise but capable of specifying general models. Once you learn how to write a parameter file, you can perform very complicated analyses with the program.
33 | 
34 | Each program saves the solutions of the mixed model equations (e.g., EBV) to a file. Estimated variance components are also saved to files permanently.
35 | 
36 | ### Stability ###
37 | 
38 | The programs have been tested by many researchers since their public release around 2000. They are now stable enough to be used for routine genetic evaluation at the national level. The team at the University of Georgia heavily uses the programs for their research.
39 | 
40 | ### Active development ###
41 | 
42 | The programs are continuously maintained by the UGA team. New features are added to incorporate new methodologies and to improve usability and computing time. The programs fully support single-step genomic BLUP (ssGBLUP); GWAS can be performed within the ssGBLUP framework.
43 | 
44 | ### Speed ###
45 | 
46 | We have been optimizing the programs for speed, not only by using multithreaded libraries (MKL) but also through detailed optimization of fundamental subroutines. The current version is remarkably faster than older versions, especially in REML and Gibbs sampling.
47 | 
48 | Is it easy?
49 | -----------
50 | 
51 | Yes, it is. But the learning process is not always easy. There may be several hurdles to learning how to use the BLUPF90 programs in actual research. This is the flip side of its simplicity.
52 | 
53 | * Limited documentation. The official website hosts a manual for the programs (<http://nce.ads.uga.edu/wiki/doku.php>). The official wiki also provides various information about the software. However, documentation, especially for beginners, is not fully available in the existing resources.
54 | * Data manipulation. This is not a specific disadvantage of BLUPF90 but is common to similar software. The BLUPF90 family focuses on mixed model analyses and does not provide a data manipulation framework like R or SAS. Since the programs use text files as input, some text processing may be needed. You should prepare data and pedigree files in a specific format and check for erroneous records before running the programs.
55 | * Pre-processing. Every program accepts only numerical values to maintain simplicity in programming. If your data or pedigree file contains characters (letters or symbols), you must replace them with integer codes before analysis. One of the programs, RENUMF90, can perform this task. For field or commercial data, you will need to run RENUMF90 before using BLUPF90.
56 | 
57 | The purpose of this tutorial is to provide such documentation with many examples.
58 | 


--------------------------------------------------------------------------------
/introduction_about.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Introduction
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | About this tutorial
 10 | ===================
 11 | 
 12 | General information
 13 | -------------------
 14 | 
 15 | This document explains how to describe a model in a parameter file used with the BLUPF90 family of programs, how to prepare data and pedigree files, and how to perform genomic analyses with the programs. This tutorial assumes that the reader has sufficient knowledge of linear mixed models and data manipulation techniques on a computer. Experience with similar software will also be helpful for understanding the contents.
 16 | 
 17 | Features of the BLUPF90 program are introduced in an introductory style. In particular, the first section of each chapter presents essential features using a very simple example. Later sections introduce new concepts through more practical examples. We recommend that readers do not skip any sections, even if they seem too elementary.
 18 | 
 19 | This tutorial complements the official manual and wiki pages. Because the official manual serves more as a reference, it presents each topic concisely. For advanced topics or technical tricks, please refer to the manual. Conversely, if you find that something is missing in the manual, this tutorial may help you understand the background. If you have a question that is not addressed in the manual, the wiki, or this tutorial, please ask someone in the Yahoo Group (`https://groups.yahoo.com/neo/groups/blupf90/info`). You may find answers in the historical discussion logs of the forum.
 20 | 
 21 | Examples in each section
 22 | ------------------------
 23 | 
 24 | ### Example data
 25 | 
 26 | In each section, I will provide various examples for the reader. The files are available at the author's GitHub repository (`https://github.com/masuday/data`). Examples in this tutorial are enclosed in a frame, as shown below.
 27 | 
 28 | ~~~~~{language=text caption="example.txt"}
 29 | 1   1   2.5
 30 | 2   1   1.8
 31 | 3   1   4.2
 32 | 4   1   2.2
 33 | 5   1   3.6
 34 | ~~~~~
 35 | 
 36 | ### Parameter file for BLUPF90
 37 | 
 38 | Parameter files used in the BLUPF90 programs are highlighted as follows.
 39 | 
 40 | ~~~~~{language=blupf90 caption="param0.txt"}
 41 | # This is a BLUPF90 parameter file.
 42 | DATAFILE
 43 | data0.txt
 44 | NUMBER_OF_TRAITS
 45 | 1
 46 | NUMBER_OF_EFFECTS
 47 | 1
 48 | OBSERVATION(S)
 49 | 1
 50 | WEIGHT(S)
 51 | 
 52 | EFFECTS:
 53 | 2 3 cross
 54 | RANDOM_RESIDUAL VALUES
 55 | 1.0
 56 | ~~~~~
 57 | 
 58 | ### Parameter file for renumbering by RENUMF90
 59 | 
 60 | Alternative parameter files used in RENUMF90 (referred to as *instruction files* in this tutorial) are shown as follows.
 61 | 
 62 | ~~~~~{language=renumf90 caption="renum1.txt"}
 63 | # This is an example of a renum-parameter file.
 64 | DATAFILE
 65 | rawdata1.txt
 66 | TRAITS
 67 | 5
 68 | FIELDS_PASSED TO OUTPUT
 69 | 
 70 | WEIGHT(S)
 71 | 
 72 | RESIDUAL_VARIANCE
 73 | 1.0
 74 | EFFECT          # 1st effect
 75 | 2 cross alpha
 76 | EFFECT          # 2nd effect
 77 | 3 cross alpha
 78 | EFFECT          # 3rd effect
 79 | 4 cov
 80 | ~~~~~
 81 | 
 82 | ### Output
 83 | 
 84 | The output from the programs is shown in a different format.
 85 | 
 86 | ~~~~~{language=output}
 87 |  name of parameter file?
 88 | ~~~~~
 89 | 
 90 | ### Commands on shell
 91 | 
 92 | If you need to type a command in your shell (terminal or Command Prompt), the command line is shown as follows.
 93 | 
 94 | ~~~~~{language=shell}
 95 | ./airemlf90 renf90.par
 96 | ~~~~~
 97 | 
 98 | 
 99 | Disclaimer
100 | ----------
101 | 
102 | The following disclaimer is based on the MIT License, although this tutorial is distributed under the Creative Commons BY-NC-ND 4.0 International license.
103 | 
104 | > This tutorial is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement.
105 | > In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the tutorial or the use or other dealings in the tutorial.
106 | 
107 | I wrote this documentation in Markdown using Pandoc extensions. The PDF file is generated using Pandoc, so there may be typesetting or layout issues.
108 | 
109 | TODO
110 | ----
111 | 
112 | These changes can be made in the next edition.
113 | 
114 | - Fully revise the descriptions to support the new programs `blupf90+` and `gibbsf90+`.
115 | - Fully revise the chapters related to Dr. Mrode's textbook. The textbook has been updated (now in its 4th edition, "Linear Models for the Prediction of the Genetic Merit of Animals"), and a new author, Ivan Pocrnić, has joined. The numerical examples are now freely available online, and the revised chapters will make use of these materials.
116 | - Add additional explanations on the usage of other programs, such as `idsolf90`.
117 | 


--------------------------------------------------------------------------------
/mrode_c11ex116_ssgblup.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Single-step approach
 10 | ====================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | We consider the standard animal model
 16 | $$
 17 | \mathbf{y}=\mathbf{Xb}+\mathbf{Wu}+\mathbf{e}.
 18 | $$
 19 | If some animals are genotyped, their additive relationships are described with the genomic relationship matrix ($\mathbf{G}$). When the genotyped and the non-genotyped animals are simultaneously considered in the relationship matrix, the resulting matrix is $\mathbf{H}$. Its inverse falls into a simple form.
 20 | $$
 21 | \mathbf{H}^{-1}
 22 | =
 23 | \mathbf{A}^{-1}
 24 | +
 25 | \left[
 26 | \begin{array}{cc}
 27 | \mathbf{0}&\mathbf{0}\\
 28 | \mathbf{0}&\mathbf{G}^{-1}-\mathbf{A}_{22}^{-1}
 29 | \end{array}
 30 | \right]
 31 | $$
 32 | This $\mathbf{G}^{-1}$ is usually blended with the pedigree matrix ($\mathbf{G}^{-1}_{w}$ shown in the previous section). The system of mixed model equations is the same as the standard animal model with $\mathbf{H}^{-1}$ instead of $\mathbf{A}^{-1}$:
 33 | $$
 34 | \left[
 35 | \begin{array}{ll}
 36 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{X}'\mathbf{R}^{-1}\mathbf{Z}\\
 37 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{Z}'\mathbf{R}^{-1}\mathbf{Z}+\mathbf{H}^{-1}/\sigma_u^2
 38 | \end{array}
 39 | \right]
 40 | \left[
 41 | \begin{array}{c}
 42 | \mathbf{\hat{b}}\\
 43 | \mathbf{\hat{u}}
 44 | \end{array}
 45 | \right]
 46 | =
 47 | \left[
 48 | \begin{array}{l}
 49 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{y} \\
 50 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{y} \\
 51 | \end{array}
 52 | \right]
 53 | $$
 54 | where $\sigma_u^2 = 35.241$ and $\sigma_e^2 = 245.0$ in this case. BLUPF90 fully supports ssGBLUP with a minimal description in the parameter file.
 55 | 
 56 | 
 57 | Files
 58 | -----
 59 | 
 60 | The data file is different from the previous ones (`data_mr11e.txt`). The pedigree information is common to the previous analysis (`pedigree_mr11e.txt`).
 61 | 
 62 | The SNP marker file is unique for this analysis.
 63 | 
 64 | ~~~~~{language=text caption="snp_mr11e.txt"}
 65 |  18 11010202210000000000000000000000000000000000000000
 66 | ...
 67 | ~~~~~
 68 | 
 69 | The corresponding cross-reference file is as follows.
 70 | 
 71 | ~~~~~{language=text caption="snp_mr11e_XrefID.txt"}
 72 | 18 18
 73 | 19 19
 74 | 20 20
 75 | 21 21
 76 | 22 22
 77 | 23 23
 78 | 24 24
 79 | 25 25
 80 | 26 26
 81 | ~~~~~
 82 | 
 83 | The parameter file is shown as follows.
 84 | 
 85 | ~~~~~{language=blupf90 caption="param_mr11e.txt"}
 86 | DATAFILE
 87 | data_mr11e.txt
 88 | NUMBER_OF_TRAITS
 89 | 1
 90 | NUMBER_OF_EFFECTS
 91 | 2
 92 | OBSERVATION(S)
 93 | 6
 94 | WEIGHT(S)
 95 | 5
 96 | EFFECTS:
 97 | 4  1 cross
 98 | 1 26 cross
 99 | RANDOM_RESIDUAL VALUES
100 | 245.0
101 | RANDOM_GROUP
102 | 2
103 | RANDOM_TYPE
104 | add_animal
105 | FILE
106 | pedigree_mr11e.txt
107 | (CO)VARIANCES
108 | 35.241
109 | OPTION SNP_file snp_mr11e.txt snp_mr11e_XrefID.txt
110 | OPTION no_ quality_control
111 | OPTION AlphaBeta 0.95 0.05
112 | OPTION tunedG 0
113 | OPTION thrStopCorAG 0.10
114 | OPTION solv_method FSPAK
115 | ~~~~~
116 | 
117 | BLUPF90 (actually the embedded genomic routine) may stop because of the very low correlation between diagonals from $\mathbf{G}$ and $\mathbf{A}_{22}$. The correlation should be usually high enough; otherwise, there may be a problem in the quality of the genotypes or the pedigree. It is low in this case due to the small data set. The option `thrStopCorAG` prevents the program from stopping from the low correlation.
118 | 
119 | 
120 | Solutions
121 | ---------
122 | 
123 | Unfortunately, the solutions are totally different from the reference values in the textbook (p.193).
124 | 
125 | ~~~~~{language=text caption="solutions"}
126 | trait/effect level  solution
127 |    1   1         1          8.38509553
128 |    1   2         1         -0.27072327
129 |    1   2         2          2.90677899
130 |    1   2         3         -0.27072327
131 |    1   2         4          2.58838142
132 |    1   2         5         -2.59488845
133 |    1   2         6         -1.88195674
134 |    1   2         7         -0.99299119
135 |    1   2         8         -1.02617193
136 |    1   2         9         -3.14377983
137 |    1   2        10         -1.69066025
138 |    1   2        11         -3.31615787
139 |    1   2        12          0.81555256
140 |    1   2        13          0.63918948
141 |    1   2        14          4.85991512
142 |    1   2        15          4.20216687
143 |    1   2        16          6.46125192
144 |    1   2        17         -1.79124924
145 |    1   2        18         -0.39297755
146 |    1   2        19          1.47720048
147 |    1   2        20         -2.90484503
148 |    1   2        21         -0.54144654
149 |    1   2        22          0.89069967
150 |    1   2        23         -2.54427924
151 |    1   2        24         -0.10603281
152 |    1   2        25          0.94047078
153 |    1   2        26          3.65328640
154 | ~~~~~
155 | 


--------------------------------------------------------------------------------
/mrode_c05ex051_mt_equal_design.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Multiple-trait model with equal design matrix and no missing observations
 10 | =========================================================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | Here we consider a multiple-trait model with equal design matrices, which means the same model is applied to all traits. We will consider a situation without missing observations.
 16 | 
 17 | In this example, the author considers the 2-trait model for the pre-weaning gain as the first trait (WWG) and the post-weaning gain as the second trait (PWG). Both traits have the same model that is, the fixed effect for sex, and two random effects, the additive genetic effect, and the residual effect.
 18 | 
 19 | We also consider the genetic covariance between two traits as well as the genetic variance for each trait. This is true for residual covariance components. The genetic covariance matrix ($\mathbf{G}_0$) and the residual covariance matrix $\mathbf{R}_0$ are both symmetric matrices ($2 \times 2$). The values are defined in the textbook as
 20 | $$
 21 | \mathbf{G}_{0}
 22 | =
 23 | \left[
 24 | \begin{array}{rr}
 25 | 20&18\\
 26 | 18&40
 27 | \end{array}
 28 | \right]
 29 | \quad
 30 | \text{and}
 31 | \quad
 32 | \mathbf{R}_{0}
 33 | =
 34 | \left[
 35 | \begin{array}{rr}
 36 | 40&11\\
 37 | 11&30
 38 | \end{array}
 39 | \right]
 40 | $$
 41 | 
 42 | A matrix notation of the above model and its mixed model equations have already introduced in the previous chapter. So here we just show some relevant equations. The model can be written as
 43 | $$
 44 | \mathbf{y} = \mathbf{Xb} + \mathbf{Zu} + \mathbf{e}.
 45 | $$
 46 | We have 2 options to order the observations in $\mathbf{y}$ (within-trait or within-animal). According to a discussion in the previous chapter, BLUPF90 orders the observations within-animal (see the previous chapter for details). The variance of $\mathbf{y}$ is
 47 | $$
 48 | \mathrm{var}(\mathbf{y}) = \mathbf{Z}\left(\mathbf{A} \otimes \mathbf{G}_{0}\right)\mathbf{Z}' + \mathbf{I} \otimes \mathbf{R}_{0}
 49 | $$
 50 | where $\otimes$ is the operator for Kronecker product. The mixed model equations can be written as
 51 | $$
 52 | \left[
 53 | \begin{array}{ll}
 54 | \mathbf{X}'(\mathbf{I}\otimes\mathbf{R}_{0}^{-1})\mathbf{X} & \mathbf{X}'(\mathbf{I}\otimes\mathbf{R}_{0}^{-1})\mathbf{Z} \\
 55 | \mathbf{Z}'(\mathbf{I}\otimes\mathbf{R}_{0}^{-1})\mathbf{X} & \mathbf{Z}'(\mathbf{I}\otimes\mathbf{R}_{0}^{-1})\mathbf{Z} + \mathbf{A}^{-1}\otimes\mathbf{G}^{-1}_{0}
 56 | \end{array}
 57 | \right]
 58 | \left[
 59 | \begin{array}{c}
 60 | \mathbf{\hat{b}}\\
 61 | \mathbf{\hat{u}}
 62 | \end{array}
 63 | \right]
 64 | =
 65 | \left[
 66 | \begin{array}{l}
 67 | \mathbf{X}'(\mathbf{I}\otimes\mathbf{R}_{0}^{-1})\mathbf{y} \\
 68 | \mathbf{Z}'(\mathbf{I}\otimes\mathbf{R}_{0}^{-1})\mathbf{y}
 69 | \end{array}
 70 | \right]
 71 | $$
 72 | 
 73 | 
 74 | Files
 75 | -----
 76 | 
 77 | The data file (`data_mr05a.txt`) is actually an extension of Example 3.1. It has an extra observation in the last column.
 78 | 
 79 | 1. Animal ID (calves)
 80 | 2. Sire ID
 81 | 3. Dam ID
 82 | 4. Pre-weaning gain (WWG; kg)
 83 | 5. Post-weaning gain (PWG; kg)
 84 | 
 85 | The pedigree file (`pedigree_mr05a.txt`) is also the same as the previous one.
 86 | 
 87 | The parameter file contains 2 covariance matrices. We will compute the standard error of each estimate as to the square root of the corresponding diagonal element of the inverse of the left-hand side of mixed model equations.
 88 | 
 89 | ~~~~~{language=blupf90 caption="param_mr05a.txt"}
 90 | DATAFILE
 91 | data_mr05a.txt
 92 | NUMBER_OF_TRAITS
 93 | 2
 94 | NUMBER_OF_EFFECTS
 95 | 2
 96 | OBSERVATION(S)
 97 | 5 6
 98 | WEIGHT(S)
 99 | 
100 | EFFECTS:
101 | 2 2 2 cross
102 | 1 1 8 cross
103 | RANDOM_RESIDUAL VALUES
104 | 40.0 11.0
105 | 11.0 30.0
106 | RANDOM_GROUP
107 | 2
108 | RANDOM_TYPE
109 | add_animal
110 | FILE
111 | pedigree_mr05a.txt
112 | (CO)VARIANCES
113 | 20.0 18.0
114 | 18.0 40.0
115 | OPTION solv_method FSPAK
116 | OPTION sol se
117 | ~~~~~
118 | 
119 | Solutions
120 | ---------
121 | 
122 | The file `solution` has the solutions for the first and second traits. The solutions are identical to the values shown in the textbook (p.74).
123 | 
124 | The reliability of the estimated breeding value of an animal can be calculated with the solutions from BLUPF90. The reliability for trait $j$ of animal $i$ ($r_{ij}^2$) is
125 | $$
126 | r_{ij}^2=1-\frac{\mathrm{PEV}_{ij}}{\sigma_{u_j}^{2}}
127 | $$
128 | where $\mathrm{PEV}_{ij}$ is the diagonal element of the inverse of the left-hand side of mixed model equations corresponding to the effect to be considered. In the above solutions, the column `s.e.` contains the square root of the inverse ($\sqrt{\mathrm{PEV}_{ij}}$). For example, the reliabilities of animal 1 can be calculated as:
129 | $$
130 | r_{11}^2=1-\frac{4.313320612^2}{20}\approx 0.070
131 | \quad
132 | \text{and}
133 | \quad
134 | r_{21}^2=1-\frac{5.991809902^2}{40}\approx 0.102.
135 | $$
136 | 


--------------------------------------------------------------------------------
/mrode_c09ex092_random_regression.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Random regression model
 10 | =======================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | Assume longitudinal data measured over time. If the genetic correlation between observations taken in 2 time-points is lower than 1, the repeatability model is inappropriate.  In this case, a multiple-trait model, which assumes that an observation measured in a differnt time-period is a separate trait, could be better to describe the trait. If the time interval is shorter, we will have more traits. As the interval approaches to 0, ultimately, we will have an infinite number of traits, and genetic correlations can be described using a function of 2 time periods. The random environmental effects also have the same covariance structure. This is equivalent to the random regression model.
 16 | 
 17 | In this example, we fit the 2nd order Legendre polynomials (3 coefficients including the intercept) to both additive genetic and permanent environmental effects. In this model, each animal has 3 random coefficients for the additive genetic effect and the other 3 coefficients for the permanent environmental effect. See the textbook for details for detailed modeling.
 18 | 
 19 | There are correlations between random regressions. This looks like a maternal model with a direct and maternal genetic covariance matrix. In this model, we have 3 random regression coefficients, each of the random effects. So the variance components are $3 \times 3$ matrix in additive genetic ($\mathbf{G}_0$) and permanent environmental effects ($\mathbf{P}_0$). The actual covariance matrices are
 20 | $$
 21 | \mathbf{G}_{0}
 22 | =
 23 | \left[
 24 | \begin{array}{rrr}
 25 | 3.297&  0.594& -1.381\\
 26 | 0.594&  0.921& -0.289\\
 27 | -1.381& -0.289& 1.005
 28 | \end{array}
 29 | \right]
 30 | \quad
 31 | \text{and}
 32 | \quad
 33 | \mathbf{P}_{0}
 34 | =
 35 | \left[
 36 | \begin{array}{rrr}
 37 |  6.872& -0.254& -1.101\\
 38 | -0.254&  3.171&  0.167\\
 39 | -1.101&  0.167&  2.457
 40 | \end{array}
 41 | \right]
 42 | $$
 43 | and $\sigma_e^2=3.710$.
 44 | 
 45 | 
 46 | Files
 47 | -----
 48 | 
 49 | We use the same data and pedigree files to the previous section (`data_mr09b.txt` and `pedigree_mr09b.txt`).
 50 | 
 51 | The parameter file is an extension to the previous one.
 52 | 
 53 | ~~~~~{language=blupf90 caption="param_mr09b.txt"}
 54 | DATAFILE
 55 | data_mr09b.txt
 56 | NUMBER_OF_TRAITS
 57 | 1
 58 | NUMBER_OF_EFFECTS
 59 | 12
 60 | OBSERVATION(S)
 61 | 4
 62 | WEIGHT(S)
 63 | 
 64 | EFFECTS:
 65 |  3 10 cross  # HTD
 66 |  5  1 cov    # Legendre polynomials (intercept) for fixed regression
 67 |  6  1 cov    # Legendre polynomials (1st order) for fixed regression
 68 |  7  1 cov    # Legendre polynomials (2nd order) for fixed regression
 69 |  8  1 cov    # Legendre polynomials (3rd order) for fixed regression
 70 |  9  1 cov    # Legendre polynomials (4th order) for fixed regression
 71 |  5  8 cov 1  # Legendre polynomials (intercept) for additive genetic effect
 72 |  6  8 cov 1  # Legendre polynomials (1st order) for additive genetic effect
 73 |  7  8 cov 1  # Legendre polynomials (2nd order) for additive genetic effect
 74 |  5  8 cov 1  # Legendre polynomials (intercept) for permanent environmental effect
 75 |  6  8 cov 1  # Legendre polynomials (1st order) for permanent environmental effect
 76 |  7  8 cov 1  # Legendre polynomials (2nd order) for permanent environmental effect
 77 | RANDOM_RESIDUAL VALUES
 78 | 3.710
 79 | RANDOM_GROUP
 80 | 7 8 9
 81 | RANDOM_TYPE
 82 | add_animal
 83 | FILE
 84 | pedigree_mr09b.txt
 85 | (CO)VARIANCES
 86 |  3.297  0.594 -1.381
 87 |  0.594  0.921 -0.289
 88 | -1.381 -0.289  1.005
 89 | RANDOM_GROUP
 90 | 10 11 12
 91 | RANDOM_TYPE
 92 | diagonal
 93 | FILE
 94 | 
 95 | (CO)VARIANCES
 96 |  6.872 -0.254 -1.101
 97 | -0.254  3.171  0.167
 98 | -1.101  0.167  2.457
 99 | OPTION solv_method FSPAK
100 | ~~~~~
101 | 
102 | The additive genetic effect is defined for effects 7, 8, and 9 and the permanent environmental effect is defined for effects 10, 11, and 12. Both have a similar definition so we just explain the additive effect. Now we look at effect 7:
103 | 
104 |     5    8 cov 1
105 | 
106 | This refers to column 5 as an intercept of the regression. Because we have 8 animals in total so we should estimate 8 intercepts. The above statement means that column 5 is treated as a covariate and the regression is nested within the animal (column 1). The remaining 2 statements can be similarly interpreted. The corresponding `RANDOM_GROUP` statement contains 3 effects: 7, 8, and 9, so we need a $3 \times 3$ genetic covariance matrix.
107 | 
108 | 
109 | Solutions
110 | ---------
111 | 
112 | The equations are not full rank so there are infinite solutions for fixed effects. The solutions for random effects are slightly different from the values in the textbook due to numerical error, but very similar.
113 | 
114 | Additive genetic random regressions are defined as effects 7, 8, and 9. For these effects, the number of levels is the maximum animal ID. With a random regression model, estimated breeding values for an animal are presented as regression coefficients. In this case, each animal has 3 coefficients. For example, breeding values for animal 3 are
115 | $$
116 | u_3
117 | =
118 | \left[
119 | \begin{array}{r}
120 | 0.13166310\\
121 | -0.02908467\\
122 | 0.07039573
123 | \end{array}
124 | \right].
125 | $$
126 | 


--------------------------------------------------------------------------------
/largescale_pcg.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Large-scale genetic evaluation
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Iteration on data with preconditioned conjugate gradient (PCG)
 10 | ==============================================================
 11 | 
 12 | Algorithm
 13 | ---------
 14 | 
 15 | Preconditioned conjugate gradient (PCG) is an iterative method to solve the linear equations. This method is easily harmonized with the "iteration on data" technique. The intermediate status is kept in 4 vectors, and the one iteration will be done by updating the vectors. BLUP90IOD2 is a program implementing this algorithm. Here we will introduce a basic idea needed to understand what the program performs. See Stranden and Lidauer (2000) and Tsuruta et al. (2001) for the detailed algorithm.
 16 | 
 17 | The mixed model equations can be written as
 18 | $$
 19 | \mathbf{Cx} = \mathbf{b}
 20 | $$
 21 | where $\mathbf{C}$ is the left-hand side matrix, $\mathbf{x}$ is the solution vector, and $\mathbf{b}$ is the right-hand side vector. If
 22 | we have a matrix $\mathbf{M}$ which is an approximation of $\mathbf{C}$, above equations are equivalent to
 23 | $$
 24 | \mathbf{M}^{-1}\mathbf{Cx} = \mathbf{M}^{-1}\mathbf{b}.
 25 | $$
 26 | This matrix $\mathbf{M}$ is called preconditioner. If $\mathbf{M = C}$, the equations are immediately solved. BLUPF90
 27 | uses $\mathbf{M} = \mathrm{diag}(\mathbf{C})$ so its inverse is easily calculated.
 28 | 
 29 | 
 30 | The residual is expressed as
 31 | $$
 32 | \mathbf{r} = \mathbf{b} - \mathbf{Cx}
 33 | $$
 34 | and the algorithm tries to reduce with statistics containing the residual. The convergence criterion with the current solution $\mathbf{\hat{x}}$ is
 35 | $$
 36 | \varepsilon = \frac{||\mathbf{b} - \mathbf{C\hat{x}}||^2}{||\mathbf{b}||^2}
 37 | $$
 38 | where $||\cdot||$ means the norm.
 39 | The default of $\varepsilon$ in BLUPF90IOD2 is $10^{-12}$.
 40 | 
 41 | Note that some other software, for example, MiX99, uses $\sqrt{\varepsilon}$. The users should be careful about the definition of convergence criterion when they have switched the software to BLUPF90IOD2.
 42 | 
 43 | 
 44 | Programs
 45 | --------
 46 | 
 47 | BLUP90IOD2 is the current program to perform the iteration on data with PCG. CBLUP90IOD can handle a threshold model or threshold-linear models with 1 threshold trait. BLUP90MBE is specialized in multibreed models with external information (see Legarra et al., 2007). In this example, we use the BLUP90IOD2 program.
 48 | 
 49 | A parallel version of BLUPF90IOD2 is now available. BLUP90IOD2OMP1 is a program supporting parallel processing in reading data and pedigree files using OpenMP. This program is useful especially for very large data set with a complicated model (like the multiple-trait model). There is no advantage to use this program for small or moderate data set. The usage of this program is the same as BLUP90IOD2.
 50 | 
 51 | 
 52 | Files and analysis
 53 | ------------------
 54 | 
 55 | Here we will use the same sample files as used in REML estimation.
 56 | 
 57 | - [`simdata.txt`](https://github.com/Masuday/data/blob/master/tutorial/simdata.txt) : data file
 58 | - [`simped.txt`](https://github.com/Masuday/data/blob/master/tutorial/simped.txt) : pedigree file
 59 | 
 60 | We will apply a 4-trait animal model to this data set with the following parameter file.
 61 | 
 62 | ~~~~~{language=blupf90 caption="iodparam1.txt"}
 63 | DATAFILE
 64 | simdata.txt
 65 | NUMBER_OF_TRAITS
 66 | 4
 67 | NUMBER_OF_EFFECTS
 68 | 4
 69 | OBSERVATION(S)
 70 | 9 10 11 12
 71 | WEIGHT(S)
 72 | 
 73 | EFFECTS:
 74 | 6 6 6 6   155 cross
 75 | 7 7 7 7     2 cross
 76 | 8 8 8 8    11 cross
 77 | 1 1 1 1  4641 cross
 78 | RANDOM_RESIDUAL VALUES
 79 |  63.568  35.276  26.535  13.533
 80 |  35.276  84.627  37.831  23.306
 81 |  26.535  37.831  75.156  28.079
 82 |  13.533  23.306  28.079  46.839
 83 | RANDOM_GROUP
 84 | 4
 85 | RANDOM_TYPE
 86 | add_animal
 87 | FILE
 88 | simped.txt
 89 | (CO)VARIANCES
 90 |  37.150  19.471  23.885  24.246
 91 |  19.471  16.128  19.571  22.239
 92 |  23.885  19.571  31.315  33.782
 93 |  24.246  22.239  33.782  51.706
 94 | ~~~~~
 95 | 
 96 | Run BLUP90IOD2 with the parameter file. It takes 185 rounds to meet the convergence.
 97 | 
 98 | ~~~~~{language=output}
 99 | round =   183  eps =  0.1707E-11  time =    0.01
100 | round =   184  eps =  0.1139E-11  time =    0.01
101 | round =   185  eps =  0.9976E-12  time =    0.01
102 |   7.1286485E-03 seconds per round
103 |  * END iteration: 11-14-2016  17h 52m 05s 737
104 |  solutions stored in file: "solutions"
105 | ~~~~~
106 | 
107 | Options
108 | -------
109 | 
110 | ~~~~~{language=blupf90}
111 | OPTION conv_crit tol
112 | ~~~~~
113 | 
114 | This option defines the convergence criterion ($\varepsilon$) to stop the iterations. A real value `tol` should be a small value. The default is `1.0E-12`. The criterion should be carefully decided because the default value could be too loose but the strict criterion requires too many rounds to converge. The best practice is to compare solutions from different convergence criterions and determine enough convergence criterion.
115 | 
116 | ~~~~~{language=blupf90}
117 | OPTION blksize n
118 | ~~~~~
119 | 
120 | This option creates a block diagonal matrix for pre-conditioner ($\mathbf{M}$). The integer value `n` defines the block size. By default, the pre-conditioner is the diagonal matrix (that is `n` is 1). The block size should be the same as the number of traits. This option will reduce the number of iterations if the specified value is valid.
121 | 


--------------------------------------------------------------------------------
/installation_windows.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Download and Installation
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Running a program in Windows
 10 | ============================
 11 | 
 12 | BLUPF90 programs do not provide a graphical window and instead run in a "DOS"-style interface. You should prepare a parameter file that describes the names of data and pedigree files, the model, and the variance components to be used in the analysis. You will be prompted to type the name of the parameter file. If you are already experienced with this type of software, you may skip this section.
 13 | 
 14 | We recommend using the standard (traditional) method for running the program. If you are not comfortable with the Windows command line interface, refer to the last subsection for a quicker method.
 15 | 
 16 | A basic procedure
 17 | -----------------
 18 | 
 19 | ### If you have no experience running a command line program... ###
 20 | 
 21 | Running a command-line program is actually a frequently asked question. The standard method is to use the Command Prompt, or simply `cmd`, which is a console that accepts commands typed with a keyboard. You can also use PowerShell instead of Command Prompt; the procedure is the same.
 22 | 
 23 | There are many tutorials available on this topic:
 24 | 
 25 | - Command Prompt - How to Use Basic Commands  
 26 |   (<http://www.digitalcitizen.life/command-prompt-how-use-basic-commands>)
 27 | - How to use the Windows command line (DOS)  
 28 |   (<http://www.computerhope.com/issues/chusedos.htm>)
 29 | - How do I run a file from MS-DOS?  
 30 |   (<http://www.computerhope.com/issues/ch000598.htm>)
 31 | - Beginners Guide: Windows Command Prompt  
 32 |   (<http://www.pcstats.com/articleview.cfm?articleID=1723>)
 33 | - How to Open Command Prompt  
 34 |   (<http://pcsupport.about.com/od/commandlinereference/f/open-command-prompt.htm>)
 35 | 
 36 | ### How to run the program ###
 37 | 
 38 | A basic procedure to run the program is as follows:
 39 | 
 40 | 1. Download the program and save it in a folder.  
 41 |    (You may optionally add the folder to your system `PATH`. If you're unsure what this is, you can skip this step.)
 42 | 2. Save the required files in the same folder.  
 43 | 3. Open the Command Prompt and use the `cd` command to change to the folder's directory.  
 44 | 4. Type the name of the program (e.g., `blupf90` or `blupf90.exe`) to launch it.  
 45 | 5. When prompted, type the name of the parameter file. Some programs may ask for additional input.  
 46 | 6. Wait for the analysis to complete.  
 47 | 7. Check and collect the results.
 48 | 
 49 | If the program runs successfully, it will ask you for the name of the parameter file with the following prompt:
 50 | 
 51 | ~~~~~{language=output}
 52 |  name of parameter file?
 53 | ~~~~~
 54 | 
 55 | Here, type the parameter file name using the keyboard. Gibbs sampling programs will ask for additional input, which you can type as prompted.
 56 | 
 57 | ### Save the output ###
 58 | 
 59 | If you want to save the output log (screen messages from the program), use redirection:
 60 | 
 61 | ~~~~~{language=shell}
 62 | blupf90 > out.txt
 63 | ~~~~~
 64 | 
 65 | This command will not display messages on the screen, but it will still accept input. You can type the parameter file name, and the program will run. The output will be saved in `out.txt`.
 66 | 
 67 | ### Omit the typing ###
 68 | 
 69 | If you prefer not to type input manually, you can use redirection. Prepare a text file that contains the name of the parameter file:
 70 | 
 71 | ~~~~~{language=text caption="in.txt"}
 72 | parameter.txt
 73 | ~~~~~
 74 | 
 75 | Then run the program as follows:
 76 | 
 77 | ~~~~~{language=shell}
 78 | blupf90 < in.txt > out.txt
 79 | ~~~~~
 80 | 
 81 | This technique is useful for Gibbs sampling programs that require several inputs. You can write multiple lines in `in.txt` instead of typing them interactively.
 82 | 
 83 | ### Stop the program ###
 84 | 
 85 | To stop the program immediately, press `Ctrl + C`. You can hold the `Ctrl` key and press `C` once. The program will stop immediately, and you can safely ignore the termination messages. In some cases involving multi-threaded computation, the program may continue running even after the stop signal. In that case, close the Command Prompt window to terminate it completely.
 86 | 
 87 | Quick run
 88 | ---------
 89 | 
 90 | You can also run the program without using Command Prompt:
 91 | 
 92 | 1. Save all required files in a folder.  
 93 | 2. Download the program and place it in the same folder.  
 94 | 3. Double-click the program; a black window will appear.  
 95 | 4. Enter the parameter file name using the keyboard. If the program asks for additional input, type the answers.  
 96 | 5. Wait for the analysis to finish. Do not close the window manually. It will close automatically when the analysis is complete.  
 97 | 6. Check the output files, which will be saved in the same folder.
 98 | 
 99 | To save the screen output (e.g., to `out.txt`), download the following batch file (`run.bat`) and place it in the same folder. Open the file with Notepad, edit the program name as needed, and save it. Then double-click `run.bat`. A black window will open, and although nothing will appear on screen, you can type the parameter file name. The program will run silently, and the output will be saved in `out.txt`.
100 | 
101 | Running the program this way (by double-clicking) is not recommended because any error message will disappear when the window closes.
102 | 
103 | ~~~~~{language=text caption="run.bat"}
104 | blupf90 > out.txt
105 | ~~~~~
106 | 
107 | This method may not work in some cases due to hidden file extensions. We do not officially support this method.
108 | 


--------------------------------------------------------------------------------
/mrode_c11ex112_mixed_snp.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Mixed linear model for computing SNP effects
 10 | ============================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | When we have many SNP markers, we can expect that almost all the additive genetic variation can be captured by the markers. In this case, the marker effects should be random because of too many effects to be estimated compared with the number of observations. With the random marker model, we might not need the polygenic effect. In this example, the author assumes the following model:
 16 | $$
 17 | y_{i} = \mu + \sum_{k=1}^{m}z_{ik} g_{k} + u_{i} + e_{i} = \mathbf{1}\mu + \mathbf{Zu} + \mathbf{e}
 18 | $$
 19 | where $y_i$ is the observation of animal $i$, $\mu$ is the general mean, $m$ is the number of markers to be considered, $z_{ik}$ is the $k$-th weighted marker genotype of the animal that is the $(i,k)$ element in $\mathbf{Z}$, $g_k$ is the $k$-th fixed SNP effectm, and $e_i$ is the residual effect. The system of mixed model equations is
 20 | $$
 21 | \left[
 22 | \begin{array}{ll}
 23 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{X}'\mathbf{R}^{-1}\mathbf{Z}\\
 24 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{Z}'\mathbf{R}^{-1}\mathbf{Z}+\mathbf{I}/\sigma_g^2
 25 | \end{array}
 26 | \right]
 27 | \left[
 28 | \begin{array}{c}
 29 | \mathbf{\hat{b}}\\
 30 | \mathbf{\hat{g}}
 31 | \end{array}
 32 | \right]
 33 | =
 34 | \left[
 35 | \begin{array}{l}
 36 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{y} \\
 37 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{y} \\
 38 | \end{array}
 39 | \right].
 40 | $$
 41 | The residual variance was shown as $\sigma_e^2 = 245.0$ in the textbook. The marker variance ($\sigma_g^2$) can be estimated based on the additive genetic variance ($\sigma_u^2$) using  $\sigma_u^2/\left[2\sum p_j(1-p_j)\right]$ where $p_j$ is the allele frequency for marker $j$. In this example, the author uses the latter equation and shows $2\sum p_j(1-p_j) = 3.5383$ so the variance components are $\sigma_g^2 = 35.242/3.5382 = 9.96$.
 42 | 
 43 | 
 44 | Files
 45 | -----
 46 | 
 47 | The data file (`data_mr11b.txt`) now contains 10 columns from $\mathbf{Z}$ (p.184).
 48 | 
 49 | 1. Animal ID
 50 | 2. Sire ID
 51 | 3. Dam ID
 52 | 4. General mean
 53 | 5. EDC
 54 | 6. Phenotype (Fat DYD)
 55 | 7. Weight = inverse of EDC
 56 | 8. Covariate for SNP 1
 57 | 9. Covariate for SNP 2
 58 | 10. Covariate for SNP 3
 59 | 11. Covariate for SNP 4
 60 | 12. Covariate for SNP 5
 61 | 13. Covariate for SNP 6
 62 | 14. Covariate for SNP 7
 63 | 15. Covariate for SNP 8
 64 | 16. Covariate for SNP 9
 65 | 17. Covariate for SNP 10
 66 | 
 67 | We can use the same pedigree defined as before (`pedigree_mr11b.txt`).
 68 | 
 69 | The parameter file contains 10 SNP effects.
 70 | 
 71 | ~~~~~{language=blupf90 caption="param_mr11b.txt"}
 72 | DATAFILE
 73 | data_mr11b.txt
 74 | NUMBER_OF_TRAITS
 75 | 1
 76 | NUMBER_OF_EFFECTS
 77 | 11
 78 | OBSERVATION(S)
 79 | 6
 80 | WEIGHT(S)
 81 | 7
 82 | EFFECTS:
 83 |  4  1 cross   # general mean
 84 |  8  1 cov     # SNP effect 1
 85 |  9  1 cov     # SNP effect 2
 86 | 10  1 cov     # SNP effect 3
 87 | 11  1 cov     # SNP effect 4
 88 | 12  1 cov     # SNP effect 5
 89 | 13  1 cov     # SNP effect 6
 90 | 14  1 cov     # SNP effect 7
 91 | 15  1 cov     # SNP effect 8
 92 | 16  1 cov     # SNP effect 9
 93 | 17  1 cov     # SNP effect 10
 94 | RANDOM_RESIDUAL VALUES
 95 | 245.0
 96 | RANDOM_GROUP
 97 | 2 3 4 5 6 7 8 9 10 11
 98 | RANDOM_TYPE
 99 | diagonal
100 | FILE
101 | 
102 | (CO)VARIANCES
103 | 9.96 0 0 0 0 0 0 0 0 0
104 | 0 9.96 0 0 0 0 0 0 0 0
105 | 0 0 9.96 0 0 0 0 0 0 0
106 | 0 0 0 9.96 0 0 0 0 0 0
107 | 0 0 0 0 9.96 0 0 0 0 0
108 | 0 0 0 0 0 9.96 0 0 0 0
109 | 0 0 0 0 0 0 9.96 0 0 0
110 | 0 0 0 0 0 0 0 9.96 0 0
111 | 0 0 0 0 0 0 0 0 9.96 0
112 | 0 0 0 0 0 0 0 0 0 9.96
113 | OPTION solv_method FSPAK
114 | ~~~~~
115 | 
116 | In the above parameter file, we defined 10 SNP effects as a group of random effects. The covariances among the effects are 0 so all the SNP effects are independent to each other. This description is equivalent to separately define each SNP effect (that is 10 `RANDOM_GROUP` blocks). A user can confirm that these 2 parameter files produce the same results.
117 | 
118 | 
119 | Solutions
120 | ---------
121 | 
122 | We should carefully look at the results shown in the textbook (p.185). The solutions from the weighted analysis seem inaccurate. The solutions come from the analysis with EDC (column 5) and the results are not correct. The correct weight is the inverse of EDC (column 7), and the following solutions in our analysis should be correct.
123 | 
124 | ~~~~~{language=text caption="solutions"}
125 | trait/effect level  solution
126 |    1   1         1          9.12440501
127 |    1   2         1          0.00004355
128 |    1   3         1         -0.00440133
129 |    1   4         1          0.00439876
130 |    1   5         1         -0.00104827
131 |    1   6         1          0.00048476
132 |    1   7         1          0.00229457
133 |    1   8         1          0.00000000
134 |    1   9         1         -0.00000000
135 |    1  10         1          0.00179833
136 |    1  11         1         -0.00125140
137 | ~~~~~
138 | 
139 | Unweighted results are also shown.
140 | 
141 | ~~~~~{language=text caption="solutions"}
142 | trait/effect level  solution
143 |    1   1         1          9.94392543
144 |    1   2         1          0.08702093
145 |    1   3         1         -0.31079216
146 |    1   4         1          0.26246003
147 |    1   5         1         -0.08027711
148 |    1   6         1          0.11020813
149 |    1   7         1          0.13908022
150 |    1   8         1         -0.00000000
151 |    1   9         1          0.00000000
152 |    1  10         1         -0.06069044
153 |    1  11         1         -0.01580233
154 | ~~~~~
155 | 


--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Introduction to BLUPF90 suite programs
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | ## Revision History
 10 | 
 11 | - [Revision History](./history.html)
 12 | 
 13 | ## Acknowledgment
 14 | 
 15 | - [Acknowledgment](./acknowledgment.html)
 16 | 
 17 | ## License
 18 | 
 19 | - [License](./license.html)
 20 | 
 21 | ## Introduction
 22 | 
 23 | - [Short introduction to BLUPF90 programs](./introduction_short.html)
 24 | - [Differences with other software](./introduction_difference.html)
 25 | - [About the software](./introduction_condition.html)
 26 | - [About this tutorial](./introduction_about.html)
 27 | - [New programs](./introduction_plus.html)
 28 | 
 29 | ## Download and Installation
 30 | 
 31 | - [Introduction](./installation_start.html)
 32 | - [Availability](./installation_availability.html)
 33 | - [Running a program in Windows](./installation_windows.html)
 34 | - [Running a program in Linux and MacOS X](./installation_linux.html)
 35 | - [Additional settings](./installation_env.html)
 36 | - [Setup a text editor](./installation_editor.html)
 37 | 
 38 | ## Quick tour of BLUPF90
 39 | 
 40 | - [Introduction](./quicktour_start.html)
 41 | - [Trivial analyses for fixed models](./quicktour_fixed.html)
 42 | - [Trivial analyses for mixed models](./quicktour_mixed.html)
 43 | - [Trivial analyses for single-step GBLUP](./quicktour_ssgblup.html)
 44 | - [Trivial analyses for multiple-trait models](./quicktour_mt.html)
 45 | 
 46 | ## Data preparation with RENUMF90
 47 | 
 48 | - [Introduction](./renum_start.html)
 49 | - [Basic data preparation](./renum_basic.html)
 50 | - [Animal model with pedigree file](./renum_pedigree.html)
 51 | - [Genomic model with SNP-marker file](./renum_genomic.html)
 52 | - [Multiple-trait models](./renum_mt.html)
 53 | - [Advanced usage of RENUMF90](./renum_advanced.html)
 54 | - [What if I don't want to use RENUMF90?](./renum_norenum.html)
 55 | 
 56 | ## Variance component estimation
 57 | 
 58 | - [Restricted (residual) maximum likelihood with AIREMLF90 (BLUPF90+)](./vc_aireml.html)
 59 | - [Gibbs sampling and post-Gibbs analysis](./vc_gs.html)
 60 | - [Advanced usage of AIREMLF90 (BLUPF90+)](./vc_advanced_aireml.html)
 61 | - [Advanced features for Gibbs sampling programs](./vc_advanced_gs.html)
 62 | 
 63 | ## Numerical examples from Mrode (2014)
 64 | 
 65 | - [Introduction](./mrode_start.html)
 66 | - [Chapter 3. Example 3.1 Animal model](./mrode_c03ex031_animal_model.html)
 67 | - [Chapter 3. Example 3.2. Sire model](./mrode_c03ex032_sire_model.html)
 68 | - [Chapter 3. Example 3.3. Reduced animal model](./mrode_c03ex033_reduced_animal_model.html)
 69 | - [Chapter 3. Example 3.4 Animal model with groups](./mrode_c03ex034_animal_model_with_groups.html)
 70 | - [Chapter 4. Example 4.1 Repeatability model](./mrode_c04ex041_repeatability_model.html)
 71 | - [Chapter 4. Example 4.2. Model with common environmental effects](./mrode_c04ex042_common_environment.html)
 72 | - [Chapter 5. Example 5.1. Multiple-trait model, Equal design matrices and no missing records](./mrode_c05ex051_mt_equal_design.html)
 73 | - [Chapter 5. Example 5.2. Multiple-trait model, Equal design matrices with missing records](./mrode_c05ex052_mt_missing.html)
 74 | - [Chapter 5. Example 5.3. Multiple-trait model, Unequal design matrices](./mrode_c05ex053_mt_unequal_design.html)
 75 | - [Chapter 5. Example 5.4. Multivariate models with no environmental covariance](./mrode_c05ex054_mt_no_covariance.html)
 76 | - [Chapter 7. Example 7.1. Animal model for a maternal trait](./mrode_c07ex071_maternal.html)
 77 | - [Chapter 8. Example 8.1. Animal model with social interaction effects](./mrode_c08ex081_social_interaction.html)
 78 | - [Chapter 9. Example 9.1. Fixed regression model](./mrode_c09ex091_fixed_regression.html)
 79 | - [Chapter 9. Example 9.2. Random regression model](./mrode_c09ex092_random_regression.html)
 80 | - [Chapter 10. Example 10.2. Prediction of breeding values with marker information](./mrode_c10ex102_marker_information.html)
 81 | - [Chapter 10. Example 10.3. Directly predicting the additive genetic merit at the QTL](./mrode_c10ex103_qtl.html)
 82 | - [Chapter 11. Example 11.1. Fixed effect model for SNP effects](./mrode_c11ex111_fixed_snp.html)
 83 | - [Chapter 11. Example 11.2. Mixed linear model for computing SNP effects](./mrode_c11ex112_mixed_snp.html)
 84 | - [Chapter 11. Example 11.3. Equivalent models -- GBLUP](./mrode_c11ex113_gblup.html)
 85 | - [Chapter 11. Example 11.5. Mixed linear models with polygenic effects](./mrode_c11ex115_polygenic.html)
 86 | - [Chapter 11. Example 11.6. Single-step approach](./mrode_c11ex116_ssgblup.html)
 87 | - [Chapter 12. Example 12.1. Animal model with dominance effect](./mrode_c12ex121_dominance.html)
 88 | - [Chapter 12. Example 12.3. Method for rapid inversion of the dominance matrix](./mrode_c12ex123_dominance_inverse.html)
 89 | - [Chapter 13. Example 13.1. The threshold model](./mrode_c13ex131_threshold.html)
 90 | - [Chapter 13. Example 13.2. Joint analysis of quantitative and binary traits](./mrode_c13ex132_threshold_linear.html)
 91 | 
 92 | ## Large-scale genetic evaluation
 93 | - [Introduction](./largescale_start.html)
 94 | - [Issues in a large scale analysis](./largescale_issues.html)
 95 | - [Iteration on data with preconditioned conjugate gradient (PCG)](./largescale_pcg.html)
 96 | - [Approximation of accuracy and reliability](./largescale_reliability.html)
 97 | - [REML estimation with large data](./largescale_reml.html)
 98 | 
 99 | ## Practical genomic analysis
100 | - [Introduction](./genomic_start.html)
101 | - [Files used in genomic analysis](./genomic_files.html)
102 | - [Quality control of SNP markers](./genomic_qc.html)
103 | - [Tuning and input/output of relationship matrices](./genomic_tuning.html)
104 | - [Performing GBLUP](./genomic_gblup.html)
105 | - [GWAS using the ssGBLUP framework](./genomic_gwas.html)
106 | 
107 | ## References
108 | 
109 | - [References](./references.html)
110 | 


--------------------------------------------------------------------------------
/mrode_c08ex081_social_interaction.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Social interaction effects
 10 | ==========================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | When an animal is raised with a few other animals in a limited space (for example, a pen or cage), there is social interaction (for example, competition) among the animals. The animal's phenotype will be affected with its own genetic and environmental effects as well as the competitors' contributions. The competitors' effects look like environmental contributions to the animal. This structure is similar to the maternal animal model introduced in the previous section. Some of the competitors' contributions can come from genes carried by the competitors and the rest comes from non-genetic factors. The genetic component can be correlated with the animal's direct genetic effect. A statistical model should include a complicated covariance structure.
 16 | 
 17 | In this example, we have 3 pens and each pen has 3 animals ($n = 3$). See the textbook for details for detailed formulation.
 18 | 
 19 | The model contains sex as the fixed effect; the direct and associative genetic effects, pen (group), and common environmental (full-sibs) effects as random effects. The author assume $n = 3$ and $\sigma_g^2 = 12.12$ (group variance), $\sigma_e^2 = 60.6$ (the original residual variance), $\sigma_e^{2*} =60.6 - 12.12 = 48.48$ (the final residual variance), $\sigma_c^2 = 12.5$ (common environmental variance) and
 20 | $$
 21 | \mathbf{G}_{0}
 22 | =
 23 | \left[
 24 | \begin{array}{ll}
 25 | 25.70&2.25\\
 26 | 2.25&3.60
 27 | \end{array}
 28 | \right].
 29 | $$
 30 | 
 31 | 
 32 | Files
 33 | -----
 34 | 
 35 | The data set is shown in p.125 (`data_mr08a.txt`). We added 3 columns as competitors' ID.
 36 | 
 37 | ~~~~~{language=text caption="data_mr08a.txt"}
 38 |  7 1 4 1 1  5.50   8  9  1
 39 |  8 1 4 1 2  9.80   7  9  1
 40 | ...
 41 | ~~~~~
 42 | 
 43 | Each column has the following information.
 44 | 
 45 |    1. Animal ID
 46 |    2. Sire ID
 47 |    3. Dam ID
 48 |    4. Pen (group)
 49 |    5. Sex (1=male and 2=female)
 50 |    6. Growth rate (10$\times$g/day)
 51 |    7. ID of competitor 1 in the same pen
 52 |    8. ID of competitor 2 in the same pen
 53 |    9. Half-sib code (common environment)
 54 | 
 55 | The pedigree is derived from the data set.
 56 | 
 57 | The parameter file looks like a maternal animal model. See the textbook for details and make sure the order of effects is correct (p.126).
 58 | 
 59 | ~~~~~{language=text caption="param_mr08a.txt"}
 60 | DATAFILE
 61 | data_mr08a.txt
 62 | NUMBER_OF_TRAITS
 63 | 1
 64 | NUMBER_OF_EFFECTS
 65 | 6
 66 | OBSERVATION(S)
 67 | 6
 68 | WEIGHT(S)
 69 | 
 70 | EFFECTS:
 71 |  5  2 cross  # sex
 72 |  1 15 cross  # direct genetic
 73 |  7  0 cross  # associative genetic 1
 74 |  8 15 cross  # associative genetic 2
 75 |  9  3 cross  # common environmental
 76 |  4  3 cross  # random group (pen)
 77 | RANDOM_RESIDUAL VALUES
 78 | 48.48
 79 | RANDOM_GROUP
 80 | 2 3
 81 | RANDOM_TYPE
 82 | add_animal
 83 | FILE
 84 | pedigree_mr08a.txt
 85 | (CO)VARIANCES
 86 | 25.7 2.25
 87 | 2.25 3.60
 88 | RANDOM_GROUP
 89 | 5
 90 | RANDOM_TYPE
 91 | diagonal
 92 | FILE
 93 | 
 94 | (CO)VARIANCES
 95 | 12.5
 96 | RANDOM_GROUP
 97 | 6
 98 | RANDOM_TYPE
 99 | diagonal
100 | FILE
101 | 
102 | (CO)VARIANCES
103 | 12.12
104 | OPTION solv_method FSPAK
105 | ~~~~~
106 | 
107 | The parameter file is tricky. The 3rd effect in `EFFECTS:` is defined with 0 levels. This works with the 4th effect to put two elements on the same row in the incidence matrix $\mathbf{Z}_S$ (see p.126). Each statement in `EFFECTS` can put only one element on a row of the system of mixed model equations. If we define an effect with 0 levels, this effect is not recognized as a new effect and combined with the next effect. In this case, the 3rd effect is successfully processed but the offset address is not incremented, so the 4th effect will be put the same row in $\mathbf{Z}_S$.
108 | 
109 | We can see what is going on processing the data with those 2 statements (effect 3 and 4). Consider the first line in the data file.
110 | 
111 |      7   1   4   1    1   5.50    8   9   1
112 | 
113 | The 2 statements perform:
114 | 
115 | - to read the 7th column (the value is 8); and to put 1 (because of cross effect) on column 8 in the 1st
116 |   row in $\mathbf{Z}_S$.
117 | - to read the 8th column (the value is 9); and to put 1 (because of cross effect) on column 9 in the 1st
118 |   row in $\mathbf{Z}_S$.
119 | 
120 | The resulting row in $\mathbf{Z}_S$ is
121 | $$
122 | \left[
123 | \begin{array}{ccccccccccccccc}
124 | 0& 0& 0& 0& 0& 0& 0& 1& 1& 0& 0& 0& 0& 0& 0
125 | \end{array}
126 | \right].
127 | $$
128 | 
129 | The textbook omits the first 8 columns in $\mathbf{Z}_S$. So above row is identical to the textbook. Next, we consider the 2nd line in the data file.
130 | 
131 |     8     1   4   1   2   9.80     7   9   1
132 | 
133 | The 2 statements perform:
134 | 
135 | - to read the 7th column (the value is 7); and to put 1 (because of cross effect) on column 7 in the 1st row in $\mathbf{Z}_S$.
136 | - to read the 8th column (the value is 9); and to put 1 (because of cross effect) on column 9 in the 1st row in $\mathbf{Z}_S$.
137 | The resulting row in $\mathbf{Z}_S$ is
138 | $$
139 | \left[
140 | \begin{array}{ccccccccccccccc}
141 |     0& 0& 0& 0& 0& 0& 1& 0& 1& 0& 0& 0& 0& 0& 0
142 | \end{array}
143 | \right].
144 | $$
145 | This is identical to the 2nd row in $\mathbf{Z}_{S}$ in the textbook.
146 | 
147 | The specification of `RANDOM_TYPE` is also tricky. In the first `RANDOM_GROUP`, we only specify the effect 2 and 3. Effect 4 should be omitted because it is the same effect as effect 3. Also, BLUPF90 accepts only consecutive effects in `RANDOM_GROUP` so we put 2 and 3 here.
148 | 
149 | 
150 | Solutions
151 | ---------
152 | 
153 | The solutions are very similar to the values in the textbook (p.126).
154 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
  1 | 
  2 | PANDOC = pandoc
  3 | 
  4 | TEXSRC = \
  5 |     history.tex \
  6 |     acknowledgment.tex \
  7 |     license.tex \
  8 |     genomic_files.tex \
  9 |     genomic_gblup.tex \
 10 |     genomic_gwas.tex \
 11 |     genomic_qc.tex \
 12 |     genomic_start.tex \
 13 |     genomic_tuning.tex \
 14 |     installation_availability.tex \
 15 |     installation_editor.tex \
 16 |     installation_env.tex \
 17 |     installation_linux.tex \
 18 |     installation_start.tex \
 19 |     installation_windows.tex \
 20 |     introduction_about.tex \
 21 |     introduction_condition.tex \
 22 |     introduction_difference.tex \
 23 |     introduction_short.tex \
 24 |     introduction_plus.tex \
 25 |     largescale_issues.tex \
 26 |     largescale_pcg.tex \
 27 |     largescale_reliability.tex \
 28 |     largescale_reml.tex \
 29 |     largescale_start.tex \
 30 |     mrode_c03ex031_animal_model.tex \
 31 |     mrode_c03ex032_sire_model.tex \
 32 |     mrode_c03ex033_reduced_animal_model.tex \
 33 |     mrode_c03ex034_animal_model_with_groups.tex \
 34 |     mrode_c04ex041_repeatability_model.tex \
 35 |     mrode_c04ex042_common_environment.tex \
 36 |     mrode_c05ex051_mt_equal_design.tex \
 37 |     mrode_c05ex052_mt_missing.tex \
 38 |     mrode_c05ex053_mt_unequal_design.tex \
 39 |     mrode_c05ex054_mt_no_covariance.tex \
 40 |     mrode_c07ex071_maternal.tex \
 41 |     mrode_c08ex081_social_interaction.tex \
 42 |     mrode_c09ex091_fixed_regression.tex \
 43 |     mrode_c09ex092_random_regression.tex \
 44 |     mrode_c10ex102_marker_information.tex \
 45 |     mrode_c10ex103_qtl.tex \
 46 |     mrode_c11ex111_fixed_snp.tex \
 47 |     mrode_c11ex112_mixed_snp.tex \
 48 |     mrode_c11ex113_gblup.tex \
 49 |     mrode_c11ex115_polygenic.tex \
 50 |     mrode_c11ex116_ssgblup.tex \
 51 |     mrode_c12ex121_dominance.tex \
 52 |     mrode_c12ex123_dominance_inverse.tex \
 53 |     mrode_c13ex131_threshold.tex \
 54 |     mrode_c13ex132_threshold_linear.tex \
 55 |     mrode_start.tex \
 56 |     quicktour_fixed.tex \
 57 |     quicktour_mixed.tex \
 58 |     quicktour_mt.tex \
 59 |     quicktour_ssgblup.tex \
 60 |     quicktour_start.tex \
 61 |     references.tex \
 62 |     renum_norenum.tex \
 63 |     renum_advanced.tex \
 64 |     renum_basic.tex \
 65 |     renum_genomic.tex \
 66 |     renum_mt.tex \
 67 |     renum_pedigree.tex \
 68 |     renum_start.tex \
 69 |     vc_advanced_aireml.tex \
 70 |     vc_advanced_gs.tex \
 71 |     vc_aireml.tex \
 72 |     vc_gs.tex
 73 | 
 74 | HTMLSRC = \
 75 |     history.html \
 76 |     acknowledgment.html \
 77 |     license.html \
 78 |     genomic_files.html \
 79 |     genomic_gblup.html \
 80 |     genomic_gwas.html \
 81 |     genomic_qc.html \
 82 |     genomic_start.html \
 83 |     genomic_tuning.html \
 84 |     installation_availability.html \
 85 |     installation_editor.html \
 86 |     installation_env.html \
 87 |     installation_linux.html \
 88 |     installation_start.html \
 89 |     installation_windows.html \
 90 |     introduction_about.html \
 91 |     introduction_condition.html \
 92 |     introduction_difference.html \
 93 |     introduction_plus.html \
 94 |     introduction_short.html \
 95 |     largescale_issues.html \
 96 |     largescale_pcg.html \
 97 |     largescale_reliability.html \
 98 |     largescale_reml.html \
 99 |     largescale_start.html \
100 |     mrode_c03ex031_animal_model.html \
101 |     mrode_c03ex032_sire_model.html \
102 |     mrode_c03ex033_reduced_animal_model.html \
103 |     mrode_c03ex034_animal_model_with_groups.html \
104 |     mrode_c04ex041_repeatability_model.html \
105 |     mrode_c04ex042_common_environment.html \
106 |     mrode_c05ex051_mt_equal_design.html \
107 |     mrode_c05ex052_mt_missing.html \
108 |     mrode_c05ex053_mt_unequal_design.html \
109 |     mrode_c05ex054_mt_no_covariance.html \
110 |     mrode_c07ex071_maternal.html \
111 |     mrode_c08ex081_social_interaction.html \
112 |     mrode_c09ex091_fixed_regression.html \
113 |     mrode_c09ex092_random_regression.html \
114 |     mrode_c10ex102_marker_information.html \
115 |     mrode_c10ex103_qtl.html \
116 |     mrode_c11ex111_fixed_snp.html \
117 |     mrode_c11ex112_mixed_snp.html \
118 |     mrode_c11ex113_gblup.html \
119 |     mrode_c11ex115_polygenic.html \
120 |     mrode_c11ex116_ssgblup.html \
121 |     mrode_c12ex121_dominance.html \
122 |     mrode_c12ex123_dominance_inverse.html \
123 |     mrode_c13ex131_threshold.html \
124 |     mrode_c13ex132_threshold_linear.html \
125 |     mrode_start.html \
126 |     quicktour_fixed.html \
127 |     quicktour_mixed.html \
128 |     quicktour_mt.html \
129 |     quicktour_ssgblup.html \
130 |     quicktour_start.html \
131 |     references.html \
132 |     renum_norenum.html \
133 |     renum_advanced.html \
134 |     renum_basic.html \
135 |     renum_genomic.html \
136 |     renum_mt.html \
137 |     renum_pedigree.html \
138 |     renum_start.html \
139 |     vc_advanced_aireml.html \
140 |     vc_advanced_gs.html \
141 |     vc_aireml.html \
142 |     vc_gs.html \
143 |     index.html
144 | 
145 | .PHONY: all img clean
146 | 
147 | .SUFFIXES: .tex .md .html
148 | 
149 | all: tutorial_blupf90.pdf $(HTMLSRC) img
150 | 
151 | tutorial_blupf90.pdf: tutorial_blupf90.tex $(TEXSRC)
152 | 	pdflatex -halt-on-error tutorial_blupf90
153 | #	makeindex tutorial_blupf90
154 | 	pdflatex -halt-on-error tutorial_blupf90
155 | 	cp tutorial_blupf90.pdf pdf/
156 | 
157 | acknowledgment.tex: acknowledgment.md
158 | #	pandoc -t latex --listings -o $@ $<
159 | 	pandoc -t latex --lua-filter=remove-softbreaks.lua --listings -o $@ $<
160 | 
161 | .md.tex:
162 | #   pandoc -t latex --top-level-division=section --listings -o $@ $<
163 | 	pandoc -t latex --top-level-division=section --lua-filter=remove-softbreaks.lua --listings -o $@ $<
164 | 
165 | .md.html:
166 | #	pandoc --mathjax -smart -s -t html --toc --toc-depth=2 --template GitHub.html5.txt -o $@ $<
167 | #	pandoc --mathjax -t html --toc --toc-depth=2 --template GitHub.html5.txt -o $@ $<
168 | 	pandoc --mathjax -t html --toc --toc-depth=2 --lua-filter=remove-softbreaks.lua --template GitHub.html5.txt -o $@ $<
169 | 	mv $@ html/
170 | 
171 | img:
172 | 	cp -p *.png html/
173 | 
174 | clean:
175 | 	rm -f *~ *.html tutorial*.pdf *.aux *.log *.out *.toc *.idx pdf/*.pdf html/*.html html/*.png
176 | 	rm -f *~ *.tex
177 | 	cp tutorial_blupf90.txt tutorial_blupf90.tex
178 | 


--------------------------------------------------------------------------------
/mrode_c07ex071_maternal.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Animal model for a maternal trait
 10 | =================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | For some traits, especially defined early in an animal's life, the phenotypes are largely affected by its dam. For example, weaning weight is expected to be associated with its dam's milk yield. The dam provides an environment for its progeny to increase (or decrease) their phenotypic values. The dam's contribution is assumed to consists of genetic and environmental factors. Also, the animal model includes the dam's contribution originally from a non-genetic background.
 16 | 
 17 | In this example, the author applied a typical maternal model. Simply, the model has two, correlated additive-genetic effects, the direct genetic effect due to the animal's genes and the maternal genetic effect due to the dam's contribution. Assume the following model:
 18 | $$
 19 | \mathbf{y} = \mathbf{Xb} + \mathbf{Zu} + \mathbf{Wm} + \mathbf{Sp} + \mathbf{e}
 20 | $$
 21 | where $\mathbf{y}$ is a vector of observations, $\mathbf{b}$ is a vector of fixed effects, $\mathbf{u}$ is a vector of direct genetic effects, $\mathbf{m}$ is a vector of maternal genetic effects, $\mathbf{p}$ is a vector of maternal permanent-environmental effects, $\mathbf{e}$ is a vector of residual effects, and other matrices are incidence matrices. The covariance structure is typically assumed as
 22 | $$
 23 | \mathrm{var}
 24 | \left[
 25 | \begin{array}{l}
 26 | \mathbf{u}\\
 27 | \mathbf{m}\\
 28 | \mathbf{p}\\
 29 | \mathbf{e}
 30 | \end{array}
 31 | \right]
 32 | =
 33 | \left[
 34 | \begin{array}{llll}
 35 | \mathbf{A}\sigma_{d}^2&\mathbf{A}\sigma_{dm}&\mathbf{0}&\mathbf{0}\\
 36 | \mathbf{A}\sigma_{md}&\mathbf{A}\sigma_{d}^2&\mathbf{0}&\mathbf{0}\\
 37 | \mathbf{0}&\mathbf{0}&\mathbf{I}\sigma_{p}^2&\mathbf{0}\\
 38 | \mathbf{0}&\mathbf{0}&\mathbf{0}&\mathbf{I}\sigma_{e}^2\\
 39 | \end{array}
 40 | \right]
 41 | $$
 42 | where $\sigma_u^2$ is the direct genetic variance, $\sigma_m^2$ is the maternal genetic variance, $\sigma_{dm}$ is the covariance between direct and maternal genetic effects, $\sigma_p^2$ is the permanent environmental variance and $\sigma_e^2$ is the residual variance. Note that this model contains a genetic covariance matrix ($\mathbf{G}_0$) even though this is a single-trait model. In this numerical example, the covariance component is
 43 | $$
 44 | \mathbf{G}_{0}
 45 | =
 46 | \left[
 47 | \begin{array}{ll}
 48 | \sigma_{d}^2&\sigma_{dm}\\
 49 | \sigma_{md}&\sigma_{m}^2
 50 | \end{array}
 51 | \right]
 52 | =
 53 | \left[
 54 | \begin{array}{ll}
 55 | 150&-40\\
 56 | -40&90
 57 | \end{array}
 58 | \right]
 59 | $$
 60 | and $\sigma_p^2=40$ and $\sigma_e^2=350$.
 61 | 
 62 | 
 63 | Files
 64 | -----
 65 | 
 66 | The data set is shown at p.111 (`data_mr07a.txt`).
 67 | 
 68 |    1. Animal ID (Calf)
 69 |    2. Sire ID
 70 |    3. Dam ID
 71 |    4. Herds
 72 |    5. Pen
 73 |    6. Birth weight (kg)
 74 | 
 75 | The pedigree is derived from the above data (`pedigree_mr07a.txt`).
 76 | 
 77 | The parameter file should be as follows.
 78 | 
 79 | ~~~~~{language=blupf90 caption="param_mr07a.txt"}
 80 | DATAFILE
 81 | data_mr07a.txt
 82 | NUMBER_OF_TRAITS
 83 | 1
 84 | NUMBER_OF_EFFECTS
 85 | 5
 86 | OBSERVATION(S)
 87 | 6
 88 | WEIGHT(S)
 89 | 
 90 | EFFECTS:
 91 |  4  3 cross  # herd
 92 |  5  2 cross  # pen
 93 |  1 14 cross  # direct genetic effect
 94 |  3 14 cross  # maternal genetic effect
 95 |  3  7 cross  # maternal permanent effect
 96 | RANDOM_RESIDUAL VALUES
 97 | 350.0
 98 | RANDOM_GROUP
 99 | 3 4
100 | RANDOM_TYPE
101 | add_animal
102 | FILE
103 | pedigree_mr07a.txt
104 | (CO)VARIANCES
105 | 150.0 -40.0
106 | -40.0  90.0
107 | RANDOM_GROUP
108 | 5
109 | RANDOM_TYPE
110 | diagonal
111 | FILE
112 | 
113 | (CO)VARIANCES
114 | 40.0
115 | OPTION solv_method FSPAK
116 | ~~~~~
117 | 
118 | In this model, we consider 5 effects (herd, pen, direct genetic, maternal genetic and maternal environmental effects). The first 2 effects are fixed. The 3rd effect is for direct genetic effects so the number of levels should be 14 (that is the number of animals found in the pedigree). The 4th effect is for maternal genetic effect; here you can put the position of dam's ID. Note that the number of levels must be 14 because the maternal effect is correlated with the direct effect and the direct effect has 14 levels (in the pedigree). In other words, each animal has both direct breeding value and maternal breeding value and the solutions will be calculated for all 14 animals. The 5th column is for maternal permanent environmental effect. This effect does not consider the pedigree file; it will only be estimated for dams with records in the data file. So the number of levels is only 7, because the maximum number of dam ID in the data is 7, although the position of effect is the same as the previous
119 | one.[^2]
120 | 
121 | [^2]: In this case, even if you put 14 as the number of levels in the 5th effect, the program correctly works. The program can work when you put a larger number of levels than the actual number.
122 | 
123 | 
124 | The direct genetic effect and the maternal genetic effect are correlated. These effects are simultaneously listed in `RANDOM_GROUP`, which has `3 4`. This is still an animal model and you can specify `add_animal` and the pedigree file. This is a single-trait model but it has 2 correlated genetic effects so the covariance matrix ($\mathbf{G}_0$) should be $2 \times 2$ written in `(CO)VARIANCES`.
125 | 
126 | We have one more maternal effect which is an independent random effect. You just put 5 after `RANDOM_GROUP`. The type is diagonal. You do not need a pedigree file here. Only one variance component is required in `(CO)VARIANCES`.
127 | 
128 | 
129 | Solutions
130 | ---------
131 | 
132 | You can confirm that the estimated direct and maternal breeding values are identical to the textbook (p.113).
133 | 
134 | As noted in the textbook, this equation has a dependency between fixed effects so we will have an infinite number of solutions for fixed effects. Above solutions are one of them. If you want to obtain the same solutions presented in the textbook, you can replace 1 with 0 for herd number in the data file and run BLUPF90 again. You can see that the solutions for the fixed effects are different from the previous results. The solutions for random effects will remain unchanged.
135 | 


--------------------------------------------------------------------------------
/mrode_c13ex132_threshold_linear.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Joint analysis of quantitative and binary traits
 10 | ================================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | We can consider a 2-trait model with one threshold trait and one continuous trait. When we use Gibbs sampling to draw the posterior distribution for a location parameter, we can simply obtain a solution as the posterior mean. However, once we try to calculate the solutions without Gibbs sampling, the resulting equations become nonlinear and a complicated strategy is needed to solve the equations. See Wang et al. (1997) for detailed discussion. For the threshold trait, the solutions obtained with anyways should be converted differently as the regular threshold model. The author cited Foulley et al. (1983) and explains how to do it.
 16 | 
 17 | In this section, we will simply use THRGIBBS1F90 to draw the posterior mean for a location parameter. In this example, we have only 1 threshold, so the program fixes the threshold to 0 and the residual variance to 1.0.
 18 | 
 19 | File
 20 | ----
 21 | 
 22 | The data file is from Foulley et al. (1983) and shown below.
 23 | 
 24 | ~~~~~{language=text caption="data_mr13b.txt"}
 25 | 1 1 1 1 41.0 1
 26 | 1 1 1 1 37.5 1
 27 | 1 1 1 2 41.5 1
 28 | 1 1 2 2 40.0 1
 29 | 1 1 2 2 43.0 1
 30 | 1 1 2 2 42.0 1
 31 | 1 1 2 2 35.0 1
 32 | 2 1 1 2 46.0 1
 33 | 2 1 1 2 40.5 1
 34 | 2 1 2 2 39.0 1
 35 | 1 2 1 1 41.4 1
 36 | 1 2 1 1 43.0 2
 37 | 1 2 2 2 34.0 1
 38 | 1 2 2 1 47.0 2
 39 | 1 2 2 1 42.0 1
 40 | 2 2 2 1 44.5 1
 41 | 2 2 2 1 49.0 1
 42 | 1 3 1 1 41.6 1
 43 | 2 3 1 1 36.0 1
 44 | 2 3 1 2 42.7 1
 45 | 2 3 2 2 32.5 1
 46 | 2 3 2 2 44.4 1
 47 | 2 3 2 1 46.0 1
 48 | 1 4 2 1 47.0 2
 49 | 1 4 2 2 51.0 2
 50 | 1 4 2 2 39.0 1
 51 | 2 4 1 1 44.5 1
 52 | 1 5 1 1 40.5 1
 53 | 1 5 1 2 43.5 1
 54 | 1 5 2 1 42.5 1
 55 | 1 5 2 1 48.8 2
 56 | 1 5 2 1 38.5 1
 57 | 1 5 2 1 52.0 1
 58 | 1 5 2 2 48.0 1
 59 | 2 5 1 2 41.0 1
 60 | 2 5 1 1 50.5 2
 61 | 2 5 2 1 43.7 2
 62 | 2 5 2 1 51.0 2
 63 | 1 6 1 2 51.6 2
 64 | 1 6 1 1 45.3 2
 65 | 1 6 1 2 36.5 1
 66 | 1 6 2 1 50.5 1
 67 | 1 6 2 1 46.0 2
 68 | 1 6 2 1 45.0 1
 69 | 1 6 2 2 36.0 1
 70 | 2 6 1 2 43.5 1
 71 | 2 6 1 2 36.5 1
 72 | ~~~~~
 73 | 
 74 | This file contains 6 columns.
 75 | 
 76 | 1. Heifer origin
 77 | 2. Sire ID
 78 | 3. Season
 79 | 4. Sex (1=male and 2=female)
 80 | 5. BW (body weight)
 81 | 6. CD (calving difficulty)
 82 | 
 83 | The pedigree file contains a sire, sire of the sire, and the maternal grand-sire.
 84 | 
 85 | ~~~~~{language=text caption="pedigree_mr13b.txt"}
 86 | 1 0 0
 87 | 2 0 0
 88 | 3 1 0
 89 | 4 2 1
 90 | 5 3 2
 91 | 6 2 3
 92 | ~~~~~
 93 | 
 94 | As the author mentions, we should use the corrected genetic parameters ($\mathbf{G}_{c}$) as shown on p.235. The residual variance is 1.0 for CD and 20.0 for BW, and the residual covariance is 2.0527 because the residual correlation is 0.459. The parameter file is as follows. In our case, we exchange the order of traits that is the trait 1 is CD and the trait 2 is BW.
 95 | 
 96 | ~~~~~{language=blupf90 caption="param_mr13b.txt"}
 97 | DATAFILE
 98 | data_mr13b.txt
 99 | NUMBER_OF_TRAITS
100 | 2
101 | NUMBER_OF_EFFECTS
102 | 4
103 | OBSERVATION(S)
104 | 6 5
105 | WEIGHT(S)
106 | 
107 | EFFECTS:      # l=H+M+E+s+e
108 | 1 1 2 cross   # origin
109 | 3 3 2 cross   # season
110 | 4 4 2 cross   # sex
111 | 2 2 6 cross   # sire
112 | RANDOM_RESIDUAL VALUES
113 | 1.0000 2.0527
114 | 2.0527 20.000
115 | RANDOM_GROUP
116 | 4
117 | RANDOM_TYPE
118 | add_sire
119 | FILE
120 | pedigree_mr13b.txt
121 | (CO)VARIANCES
122 | 0.0300 0.0302
123 | 0.0302 0.7178
124 | OPTION cat 2 0
125 | OPTION fixed_var mean
126 | ~~~~~
127 | 
128 | Run THRGIBBS1F90 to draw 20,000 enough samples (saved in every 10 steps) and discard the first 10,000 samples as burn-in. You can see the following solutions.
129 | 
130 | ~~~~~{language=text caption="solutions"}
131 | trait/effect level  solution        SD
132 |    1   1         1         31.68328943          8.14595680
133 |    2   1         1        169.99014434         40.28879171
134 |    1   1         2         31.39648971          8.19080303
135 |    2   1         2        170.59358068         40.25222186
136 |    1   2         1          5.79121444         11.84634633
137 |    2   2         1        -41.86423776         40.85833310
138 |    1   2         2          5.81766544         11.83550515
139 |    2   2         2        -40.61954962         40.79757133
140 |    1   3         1        -37.80144849          6.85003521
141 |    2   3         1        -84.56393688         42.77593912
142 |    1   3         2        -38.97895373          6.85779916
143 |    2   3         2        -87.77970197         42.74683265
144 |    1   4         1         -0.07369834          0.17245459
145 |    2   4         1         -0.26275007          0.75528469
146 |    1   4         2          0.04477009          0.17053692
147 |    2   4         2          0.10034538          0.80205603
148 |    1   4         3         -0.04892810          0.16824589
149 |    2   4         3         -0.19454123          0.80070554
150 |    1   4         4          0.04296839          0.17226439
151 |    2   4         4          0.16601219          0.80871849
152 |    1   4         5          0.02241590          0.16946856
153 |    2   4         5          0.33065415          0.78077812
154 |    1   4         6          0.03339339          0.17470579
155 |    2   4         6          0.16620116          0.78877094
156 | ~~~~~
157 | 
158 | The estimated values are close to the reference values (p.237). The small difference comes from a limited number of observations. If you put zero-constraints on the fixed effects, you may obtain more similar values as in the textbook.
159 | 
160 | Just for comparison, we show the results from CBLUP90THR with the same files shown above. The estimated threshold is $-0.0198$ with 100 rounds.
161 | 
162 | ~~~~~{language=text caption="solutions"}
163 | trait/effect level  solution
164 |   1  1       1     -1.0100
165 |   2  1       1     54.1493
166 |   1  1       2     -1.2648
167 |   2  1       2     54.7189
168 |   1  2       1      0.3162
169 |   2  2       1    -16.9395
170 |   1  2       2      0.3255
171 |   2  2       2    -15.6956
172 |   1  3       1      0.3685
173 |   2  3       1      6.3608
174 |   1  3       2     -0.7128
175 |   2  3       2      3.1574
176 |   1  4       1     -0.0601
177 |   2  4       1     -0.2699
178 |   1  4       2      0.0387
179 |   2  4       2      0.0690
180 |   1  4       3     -0.0499
181 |   2  4       3     -0.2158
182 |   1  4       4      0.0418
183 |   2  4       4      0.1376
184 |   1  4       5      0.0172
185 |   2  4       5      0.2843
186 |   1  4       6      0.0322
187 |   2  4       6      0.1358
188 | ~~~~~
189 | 


--------------------------------------------------------------------------------
/mrode_c03ex033_reduced_animal_model.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Reduced animal model
 10 | ====================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | The reduced animal model is an animal model except that the model includes only parents (that is animals with progeny). Breeding values for animals without progeny will be indirectly calculated using their parents' breeding values. We omit the description of the model and theory here. See the textbook for details or Quaas and Pollak (1980) for details.
 16 | 
 17 | BLUPF90 does not directly support the reduced animal model, but it still handles it using (very) tricky ways. Some techniques used here will also be used in random regression models and social interaction models. Although there are several ways to perform reduced animal model in BLUPF90, we here introduce an educational example which is redundant but easier to understand.
 18 | 
 19 | Files
 20 | -----
 21 | 
 22 | We can use the same pedigree file (`pedigree_mr03c.txt`) to the previous animal model except for removing animals without progeny (animal 7 and 8 in this case).
 23 | 
 24 | The data file is extended with 5 additional columns. We here show few lines of the data file.
 25 | 
 26 | ~~~~~{language=text caption="data_mr03c.txt"}
 27 | 4 1 1 0 4.5  4 0.0 0.0 1.0  1.0
 28 | ...
 29 | 7 1 4 5 3.5  0 0.5 0.5 0.0  0.8
 30 | ...
 31 | ~~~~~
 32 | 
 33 | The 6th column (the first additional column) is animal ID but has 0 if the animal has no progeny. Columns 7, 8, and 9 contain actual values to be added to the incidence matrix W (see the textbook). This idea is similar to a way to support random regressions. The columns 7 and 8 are 0.0 if the animal is a parent, or 0.5 otherwise. Column 9 is 1.0 if the animal is a parent, or 0 otherwise.
 34 | 
 35 | The column 10 (the last column) contains a weight to adjust residual variance. For parent animals, the residual variance is $\sigma_e^2 = 40$ and its inverse is $1/\sigma_e^2 = 0.025$. For non-parents, in this case, the residual variance should be $\sigma_e^{2*}=\sigma_e^2+0.5\sigma_u^2= 40 + 0.5 \times 20 = 50$ and its inverse is $1/\sigma_e^{2*} = 0.020$. If the non-parent animal has missing parent, you should set different weight on $\sigma_u^2$ (see the textbook). We use `WEIGHT(S)` to alter the residual variances. The program reads the weight from the data file and multiplies it by the inverse of residual variance. In our example, the weight is 1.0 for parent animals and 0.8 for non-parent animals because $0.8 \times (1/\sigma_e^2) = 0.8 \times 0.025 = 0.020$ and it is the correct value for non-parent animals.
 36 | 
 37 | One issue is how to add these constants to $\mathbf{W}$. BLUPF90 can usually add only values of 1 to the incidence matrix per record for a cross-classified effect. To overcome this limitation, we introduce a tricky description in the `EFFECT:` block in a parameter file.
 38 | 
 39 | ~~~~~{language=blupf90 caption="param_mr03c.txt"}
 40 | DATAFILE
 41 | data_mr03c.txt
 42 | NUMBER_OF_TRAITS
 43 | 1
 44 | NUMBER_OF_EFFECTS
 45 | 4
 46 | OBSERVATION(S)
 47 | 5
 48 | WEIGHT(S)
 49 | 10
 50 | EFFECTS:
 51 | 2 2 cross
 52 | 7 0 cov 3
 53 | 8 0 cov 4
 54 | 9 6 cov 6
 55 | RANDOM_RESIDUAL VALUES
 56 | 40.0
 57 | RANDOM_GROUP
 58 | 4
 59 | RANDOM_TYPE
 60 | add_animal
 61 | FILE
 62 | pedigree_mr03c.txt
 63 | (CO)VARIANCES
 64 | 20.0
 65 | OPTION solv_method FSPAK
 66 | ~~~~~
 67 | 
 68 | In the `EFFECTS:` block, the last 3 statements pass the value in the data file through the incidence matrix. You can see these 3 effects are in one block and only the last statement `9 6 cov 6` define the number of levels (that is the number of animals). For each statement, the program reads the value from the data file (the position is in the first number of that line) and recognizes it as a covariate (because of the `cov` keyword) and puts it into the location in the design matrix corresponding to an animal ID (the location is defined by the number after `cov`).
 69 | 
 70 | Let's see what happens when the program process these 3 statements for each record. We consider the first line of the data file.
 71 | 
 72 | ~~~~~{language=text}
 73 | 4 1 1 0 4.5       4 0.0 0.0 1.0        1.0
 74 | ~~~~~
 75 | 
 76 | - Processing the statement 2 (`7 0 cov 3`): the program reads the 7th column from the data, and recognizes it as a real value (0.0), and adds it to $\mathbf{W}$ at the location defined by the 3rd column in the data file (1st column in $\mathbf{W}$).
 77 | - Processing the statement 3 (`8 0 cov 4`): the program reads the 8th column from the data, and recognizes it as a real value (0.0), and adds it to $\mathbf{W}$ at the location defined by the 4th column in the data file (the location 0; it is ignored).
 78 | - Processing the statement 4 (`9 6 cov 6`): the program reads the 9th column from the data, and recognizes it as a real value (1.0), and adds it to $\mathbf{W}$ at the location defined by the 6th column in the data file (4th column in $\mathbf{W}$).
 79 | 
 80 | After the process, the first row in $\mathbf{W}$ should be the following.
 81 | $$
 82 | \left[
 83 | \begin{array}{cccccc}
 84 |    0.0&0.0&0.0&1.0&0.0&0.0
 85 | \end{array}
 86 | \right]
 87 | $$
 88 | 
 89 | The we will see the 4th line of the data file.
 90 | 
 91 | ~~~~~{language=text}
 92 | 7 1 4 5 3.5       0 0.5 0.5 0.0        0.8
 93 | ~~~~~
 94 | 
 95 | - Processing the statement 2 (`7 0 cov 3`): the program reads the 7th column from the data, and recognizes it as a real value (0.5), and adds it to $\mathbf{W}$ at the location defined by the 3rd column in the data file (4th column in $\mathbf{W}$).
 96 | - Processing the statement 3 (`8 0 cov 4`): the program reads the 8th column from the data, and recognizes it as a real value (0.5), and adds it to $\mathbf{W}$ at the location defined by the 4th column in the data file (5th column in $\mathbf{W}$).
 97 | - Processing the statement 4 (`9 6 cov 6`): the program reads the 9th column from the data, and recognizes it as a real value (0.0), and adds it to $\mathbf{W}$ at the location defined by the 6th column in the data file (the location 0; it is ignored).
 98 | 
 99 | After the process, the 4th row in $\mathbf{W}$ should be the following.
100 | $$
101 | \left[
102 | \begin{array}{cccccc}
103 | 0.0& 0.0& 0.0& 0.5& 0.5& 0.0
104 | \end{array}
105 | \right]
106 | $$
107 | 
108 | A similar technique will be used in a competitive model (social interaction model) which needs an incidence matrix where 2 or more elements take place in one row.
109 | 
110 | 
111 | Solutions
112 | ---------
113 | 
114 | You can see the solutions are identical to the textbook (pp.53). You will find the other ways to use the reduced animal model using the same technique used above. Now BLUPF90 supports the heterogeneous residual variances. It may simplify the implementation of the reduced animal model. A curious reader can try this method.
115 | 


--------------------------------------------------------------------------------
/largescale_reliability.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Large-scale genetic evaluation
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Approximation of accuracy and reliability
 10 | =========================================
 11 | 
 12 | Algorithm
 13 | ---------
 14 | 
 15 | Accuracy or reliability of EBV of an animal can be calculated from the prediction error variance (PEV) obtained from the diagonal elements of the inverse of left-hand side (LHS) of the mixed model equations. The diagonals can be calculated with the sparse inversion with FSPAK or YAMS for smaller data set. As expected, the computation cost for the inverse is really high and the left-hand side has to be in memory. So this direct method cannot be applied to a large scale genetic evaluation.
 16 | 
 17 | There are two major classes to approximate accuracy and reliability. One is to approximate diagonal elements in LHS after absorbing the non-genetic effects. This element looks like the effective number of records for an animal. The other approach focuses on the effective number of progeny (daughters) originally derived from the selection index theory based on a progeny test. This approach is especially useful for parents.
 18 | 
 19 | ACCF90 implements the first approach. The method was described by Strabel et al. (2001) and Sanchez et al. (2008) who extended the idea by Miszal and Wiggans (1988) to more complicated models. The algorithm is iterative because too many unknown variables are involved in the final equations to provide the approximated diagonal elements. 
 20 | 
 21 | ACCF90GS supports ssGBLUP. The algorithm was described by Miszal et al. (2013) using $\mathbf{G}^{-1}$ and $\mathbf{A}_{22}^{-1}$. This program works well for smaller data set. For larger problems, now we are developing the method without explicit computation of $\mathbf{G}^{-1}$ and $\mathbf{A}_{22}^{-1}$.
 22 | 
 23 | 
 24 | Files
 25 | -----
 26 | 
 27 | We will use the following files for a 4-trait repeatability model.
 28 | 
 29 | - [`simdata_rep.txt`](https://github.com/Masuday/data/blob/master/tutorial/simdata_rep.txt) : data file
 30 | - [`simped.txt`](https://github.com/Masuday/data/blob/master/tutorial/simped.txt) : pedigree file
 31 | 
 32 | The parameter file includes several options for ACCF90.
 33 | 
 34 | ~~~~~{language=blupf90 caption="accparam1.txt"}
 35 | DATAFILE
 36 | simdata_rep.txt
 37 | NUMBER_OF_TRAITS
 38 | 4
 39 | NUMBER_OF_EFFECTS
 40 | 5
 41 | OBSERVATION(S)
 42 | 9 10 11 12
 43 | WEIGHT(S)
 44 | 
 45 | EFFECTS: POSITIONS_IN_DATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT [ EFFECT NESTED ]
 46 | 6 6 6 6   155 cross
 47 | 7 7 7 7     2 cross
 48 | 8 8 8 8    11 cross
 49 | 1 1 1 1  4641 cross
 50 | 1 1 1 1  4641 cross
 51 | RANDOM_RESIDUAL VALUES
 52 |  45.124  22.357  18.626  13.762
 53 |  22.357  44.210  22.690  18.016
 54 |  18.626  22.690  46.101  22.795
 55 |  13.762  18.016  22.795  45.274
 56 | RANDOM_GROUP
 57 | 4
 58 | RANDOM_TYPE
 59 | add_animal
 60 | FILE
 61 | simped.txt
 62 | (CO)VARIANCES
 63 |  41.967  22.512  24.058  26.907
 64 |  22.512  17.489  19.738  24.257
 65 |  24.058  19.738  28.775  34.741
 66 |  26.907  24.257  34.741  56.668
 67 | RANDOM_GROUP
 68 | 5
 69 | RANDOM_TYPE
 70 | diagonal
 71 | FILE
 72 | 
 73 | (CO)VARIANCES
 74 |  15.546  15.237  10.490  1.4105
 75 |  15.237  40.937  19.562  6.4840
 76 |  10.490  19.562  27.782  4.5379
 77 |  1.4105  6.4840  4.5379  2.4410
 78 | OPTION cg   1 1 1 1
 79 | OPTION anim 4 4 4 4
 80 | OPTION pe   5 5 5 5
 81 | OPTION acc_maxrounds 20
 82 | ~~~~~
 83 | 
 84 | The options will be explained later. ACCF90 reads the solutions file. Run BLUPF90 (or BLUPF90IOD2) with the above parameter file, and then, run ACCF90 with the same parameter file. The program creates the file `sol_and_acc` as follows (the first 9 lines are shown).
 85 | 
 86 | ~~~~~{language=text caption="solutions"}
 87 | trait/effect level  solution acc
 88 |    1   4         1         -3.43278933  0.8728
 89 |    2   4         1          1.63205564  0.8147
 90 |    3   4         1          1.86332893  0.8510
 91 |    4   4         1          5.42050028  0.9237
 92 |    1   4         2          1.35147071  0.8754
 93 |    2   4         2          1.75742269  0.8140
 94 |    3   4         2          6.60794020  0.8512
 95 |    4   4         2          5.09637833  0.9268
 96 |    1   4         3         -9.07606792  0.8820
 97 | ~~~~~
 98 | 
 99 | This file is similar to `solutions` but the only estimated breeding values (EBV) are stored. The last column is the reliability of EBV.
100 | 
101 | 
102 | Options
103 | -------
104 | 
105 | ~~~~~{language=blupf90}
106 | OPTION cg   [ position(s)]  # contemporary group effect
107 | OPTION anim [ position(s)]  # additive genetic effect
108 | OPTION pe   [ position(s)]  # PE effect
109 | OPTION mat  [ position(s)]  # maternal genetic effect
110 | OPTION hs   [ position(s)]  # herd by sire interaction effect
111 | ~~~~~
112 | 
113 | Each option defines the position(s) for the specific effect in the `EFFECT:` block. The positions should be enumerated for all traits. If the effect is missing in a trait, put `0`. The first 2 options `cg` and `anim` are mandatory and the others are optional. With the above example, the contemporary group effect is the 1st effect in the `EFFECT:` block. Because the contemporary group effect is considered in all 4 traits, we put four 1s as `cg 1 1 1 1`. Similarly, the additive genetic effect is defined as the 4th effect for all the traits so we put `animal 4 4 4 4`; the permanent environmental effect is the 5th effect for all the traits so we put `pe 5 5 5 5`.
114 | 
115 | ~~~~~{language=blupf90}
116 | OPTION acc_maxrounds n      # n = integer number
117 | OPTION conv_crit x          # x = real small number
118 | ~~~~~
119 | 
120 | The first option defines the maximum number of iterations (the default is 10) and the second defines the convergence criterion (the default is `1.0e-8`). The iterations stop when the program reaches either criterion.
121 | 
122 | The program needs only a few iterations. Too many iterations would add bias to the approximated reliability. The recommendation is to use the default convergence criterion. The default number of iterations wouldn't be enough, and you should put a larger number to `acc_maxrounds`.
123 | 
124 | ~~~~~{language=blupf90}
125 | OPTION type x   # x = 1.0 or 0.5
126 | ~~~~~
127 | 
128 | This option defines the type of reliability. The final reliability will be calculated as $(R^2)^x$. The default $x$ is 1.0 and it corresponds to $R^2$ (that is reliability). If you put $0.5$ to $x$, the output is $(R^2)^{0.5} = R$ that is accuracy. Traditionally, the beef industry has used the accuracy ($x$ is 0.5) and the dairy has used the reliability ($x$ is 1.0). The default value is 1.0.
129 | 
130 | ~~~~~{language=blupf90}
131 | OPTION parent_avg
132 | ~~~~~
133 | 
134 | With this option, the program calculates the parent average for each animal. The values are saved in the additional column in `sol_and_acc`.
135 | 
136 | ~~~~~{language=blupf90}
137 | OPTION original_id
138 | ~~~~~
139 | 
140 | This option puts the original ID (the 10th column in the renumbered pedigree file) to `sol_and_acc`.
141 | 


--------------------------------------------------------------------------------
/introduction_difference.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | title: Introduction
 3 | author: Yutaka Masuda
 4 | date: April 2025
 5 | subject: "Introduction to BLUPF90 suite programs"
 6 | tags: [introduction,tutorial]
 7 | ...
 8 | 
 9 | Differences with other software
10 | ===============================
11 | 
12 | It is helpful for new users to understand how BLUPF90 programs differ from other software they may be familiar with. We briefly introduce these differences for your understanding.
13 | 
14 | General differences
15 | -------------------
16 | 
17 | ### R and SAS ###
18 | 
19 | * BLUPF90 can handle very large datasets. It is likely to be faster than R and SAS.
20 | * BLUPF90 is not interactive software. Users must prepare a *parameter file* that describes the details of the analysis (file names, model, genetic parameters, and options). The program simply reads the file and performs the defined task.
21 | * BLUPF90 is not a scripting language. The programs do not provide any features for data manipulation. All editing must be done before running the program. The data file must include all information required for the analysis.
22 | * BLUPF90 can only read text files. Headers (or comments) are not allowed in the data or pedigree files. Items in a text file should be separated by one or more spaces.
23 | * BLUPF90 shows the results on screen and writes them to text files.
24 | * BLUPF90 accepts only integer values as effect levels in the data and pedigree files, and real values as covariates and trait values. Alphabets or symbols must be replaced with integer codes before analysis. RENUMF90 can assist with this process.
25 | * BLUPF90 has no graphical interface. It is similar to `Rscript`, a command-line utility. The program runs on Command Prompt (also called the 'DOS' window) in Windows, Terminal in macOS, and various shells in Linux.
26 | * BLUPF90 does not provide functions for hypothesis testing. Some programs show $-2\log L$ or similar statistics, which users can use to manually perform likelihood ratio tests.
27 | 
28 | ### ASREML ###
29 | 
30 | * BLUPF90 is free for research and academic purposes, but it offers limited support from the development team.
31 | * BLUPF90 supports Gibbs sampling.
32 | * BLUPF90 supports linear mixed models and threshold models. It does not support generalized linear models.
33 | * BLUPF90 uses simple text formats for data and pedigree files. Headers, comments, alphabets, or symbols are not allowed in these files. The user must prepare the edited data before analysis.
34 | * BLUPF90 does not offer comprehensive directives for analysis. The parameter file contains minimal information such as file names, model, variance components, and options. BLUPF90 does not use labels to refer to effects in the model description.
35 | * BLUPF90 produces minimal output.
36 | 
37 | ### WOMBAT ###
38 | 
39 | * In BLUPF90 data files, observations for an animal taken at the same time should be listed in the same line. Multiple observations must be saved in one line. Storing observations from the same animal in multiple lines is allowed only in repeated records or special multi-trait models (e.g., G-by-E analysis). In WOMBAT, each trait is recorded on a separate line.
40 | * BLUPF90 does not compute inbreeding coefficients by default. The inverse of the numerator relationship matrix ($\mathbf{A}^{-1}$) is created using a non-inbred approach (i.e., Henderson's method, assuming no inbreeding). To consider inbreeding in $\mathbf{A}^{-1}$, users must supply inbreeding coefficients in a special format in the pedigree file.
41 | * BLUPF90 does not use labels to describe the model. Instead, it refers directly to the column positions of effect identifiers.
42 | * BLUPF90 does not create interaction effects. It does not support `*` notation in parameter files. Interactions must be prepared as cross-classified effects before analysis.
43 | * BLUPF90 accepts both positive and negative real numbers as covariates, whereas WOMBAT accepts only integers.
44 | * WOMBAT has more options for maximizing the likelihood (e.g., Derivative-Free, PX, reduced-rank methods), while BLUPF90 supports Bayesian estimation via Gibbs sampling.
45 | * BLUPF90 supports the threshold model. WOMBAT does not.
46 | 
47 | ### VCE ###
48 | 
49 | * BLUPF90 data and pedigree files must not contain headers or comments. The first row must be a data line.
50 | * BLUPF90 uses a very simple parameter file format, which is not as human-readable as that used in VCE. It does not use labels for effects. The parameter order in the file is fixed.
51 | * The pedigree file must contain animal ID, sire ID, and dam ID as the first three columns for the animal model. A fourth column can be added as a special feature.
52 | * BLUPF90 does not compute inbreeding coefficients by default. The inverse of the numerator relationship matrix ($\mathbf{A}^{-1}$) is generated using a non-inbred approach (i.e., Henderson's method). Users who wish to consider inbreeding must provide inbreeding coefficients in a special format in the pedigree file.
53 | 
54 | ### DMU ###
55 | 
56 | * BLUPF90 directly reads data and pedigree files; users do not need to run a DMU1-type program. Data preparation programs can be used optionally for user convenience.
57 | * The parameter file (similar to a driver file) contains model descriptions. BLUPF90 does not use labels for effects.
58 | * BLUPF90 does not differentiate between data types (integer or real). All data are read as real values and internally converted as needed.
59 | * BLUPF90 prints minimal output to the screen. It does not create log files.
60 | 
61 | Differences in purpose
62 | ----------------------
63 | 
64 | BLUPF90 is specialized for estimating BLUE, predicting BLUP, and estimating variance components in linear mixed models. It assumes that the user knows which fixed effects influence the phenotypes prior to the analysis. This is why the programs do not perform hypothesis testing for fixed effects. The goodness-of-fit for random effects can be evaluated using $-2\log L$, which is provided by the REML programs.
65 | 
66 | Difference in software design
67 | -----------------------------
68 | 
69 | The philosophy of the BLUPF90 programs is described in the official wiki and several publications. The basic idea is to support general linear mixed models with minimal programming effort. Fortran 90 makes it easy to write and reuse code. BLUPF90 is the main software that demonstrates this idea, and many other programs have been derived from it.
70 | 
71 | The current programs support genomic analyses, especially for single-step GBLUP. Computation time has been significantly improved in REML, Gibbs sampling, and BLUP with iteration on data, using parallelization and optimized libraries. The development team continues to actively update the programs to implement new ideas and improve stability.
72 | 
73 | BLUPF90 programs rely entirely on user-supplied information. This is intentional, to keep the software as simple as possible. The programs do not automatically create additional effects or covariates for convenience. For example, the general mean is not automatically added as a fixed effect if the model has none. In random regression models, the programs do not generate covariates (e.g., Legendre polynomials or spline functions) automatically. This reminds users of the information actually required in the model.
74 | 


--------------------------------------------------------------------------------
/mrode_c10ex102_marker_information.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Breeding values with marker information
 10 | =======================================
 11 | 
 12 | Models
 13 | ------
 14 | 
 15 | In this example, the author assumes one known genetic marker with the polygenic effect for a trait. The genetic marker accounts for a certain level of genetic variance. Covariances among animals in terms of the additive effects of the marker alleles can be represented as the marker covariance matrix ($\mathbf{G}_v$). Fernando and Grossman (1989) suggested a method to calculate $\mathbf{G}^{-1}_v$ directly from the pedigree list and genotypes.
 16 | 
 17 | In this example, the author assumes a model with the additive genetic (polygenic) effect, the additive genetic effect due to genetic markers, and the random residual effect. The author uses a pre-calculated $\mathbf{G}_v$ to estimate the partial breeding values explained by the marker. Also, the model includes the additive genetic relationship matrix ($\mathbf{A}_u$) which accounts for the "residual" polygenic effect. The mixed model equations are
 18 | $$
 19 | \left[
 20 | \begin{array}{lll}
 21 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{X}'\mathbf{R}^{-1}\mathbf{Z} & \mathbf{X}'\mathbf{R}^{-1}\mathbf{W}\\
 22 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{Z}'\mathbf{R}^{-1}\mathbf{Z} + \mathbf{A}_{u}^{-1}/\sigma_u^{2} & \mathbf{Z}'\mathbf{R}^{-1}\mathbf{W}\\
 23 | \mathbf{W}'\mathbf{R}^{-1}\mathbf{X} & \mathbf{W}'\mathbf{R}^{-1}\mathbf{Z} & \mathbf{W}'\mathbf{R}^{-1}\mathbf{W} + \mathbf{G}_{v}^{-1}/\sigma_v^{2}\\
 24 | \end{array}
 25 | \right]
 26 | \left[
 27 | \begin{array}{c}
 28 | \mathbf{\hat{b}}\\
 29 | \mathbf{\hat{u}}\\
 30 | \mathbf{\hat{v}}
 31 | \end{array}
 32 | \right]
 33 | =
 34 | \left[
 35 | \begin{array}{l}
 36 | \mathbf{X}'\mathbf{R}^{-1}\mathbf{y} \\
 37 | \mathbf{Z}'\mathbf{R}^{-1}\mathbf{y} \\
 38 | \mathbf{W}'\mathbf{R}^{-1}\mathbf{y}
 39 | \end{array}
 40 | \right]
 41 | $$
 42 | The variance components are $\sigma_u^2=0.30$, $\sigma_v^2= 0.05$, and $\sigma_e^2 = 0.60$.
 43 | 
 44 | 
 45 | We need to overcome several issues to conduct the analysis. First, BLUPF90 has no function to calculate $\mathbf{G}_{v}^{-1}$. Instead, the program should read a user-supplied file containing the elements of $\mathbf{G}^{-1}_{v}$. Second, the animals in the pedigree are highly inbred. By default, BLUPF90 ignores inbreeding coefficients when calculating $\mathbf{A}^{-1}$. To consider inbreeding on $\mathbf{A}^{-1}$, a user should put an additional column to the pedigree file, and accordingly modify the parameter file.
 46 | 
 47 | 
 48 | Files
 49 | -----
 50 | 
 51 | The data file is prepared as shown in p.159 (`data_mr10a.txt`). The marker effects are inserted into columns 5 and 6.
 52 | 
 53 | ~~~~~{language=text caption="data_mr10a.txt"}
 54 |   1 1 0 0  1  2  6.8
 55 | ...
 56 | ~~~~~
 57 | 
 58 | 1. Animal ID (calf)
 59 | 2. Sex (1=male and 2=female)
 60 | 3. Sire ID
 61 | 4. Dam ID
 62 | 5. Paternal QTL allele
 63 | 6. Maternal QTL allele
 64 | 7. Post weaning weight (kg)
 65 | 
 66 | Now we prepare the pedigree file with the additional 4th column to consider inbreeding coefficients for $\mathbf{A}^{-1}$.
 67 | 
 68 | The first 3 columns are animal ID, sire ID, and dam ID as usual. The 4th column is an inb/upg code exclusively used in the BLUPF90 family. The code is a 4-digit integer. It is calculated as
 69 | $$
 70 | \mathrm{inb/upg code}=\frac{4000}{(1+m_s)(1-F_s)+(1+m_d)(1-F_d)}
 71 | $$
 72 | where $m_s$ is 0 if its sire is known or 1 if the sire is unknown; $m_d$ is 0 if its dam is known or 1 if the dam is unknown; $F_s$ is the inbreeding coefficient of the sire; and $F_d$ is the inbreeding coefficient of the dam. If the sire (or dam) is unknown, $F_s$ (or $F_d$) is 0. In this case, the inbreeding coefficient of animal 4 is 0.25 (and animal 5 for 0.375 but this value is not used here). The inb/upg code ($c$) for each animal is
 73 | $$
 74 | \begin{aligned}
 75 | c_1 &= 4000/[(1 + 1)(1 - 0) + (1 + 1)(1 - 0)] = 1000\\
 76 | c_2 &= 4000/[(1 + 1)(1 - 0) + (1 + 1)(1 - 0)] = 1000\\
 77 | c_3 &= 4000/[(1 + 0)(1 - 0) + (1 + 0)(1 - 0)] = 2000\\
 78 | c_4 &= 4000/[(1 + 0)(1 - 0) + (1 + 0)(1 - 0)] = 1000\\
 79 | c_5 &= 4000/[(1 + 0)(1 - 0.25) + (1 + 0)(1 - 0)] = 2285.7 \approx 2286.
 80 | \end{aligned}
 81 | $$
 82 | RENUMF90 calculates the code and prepares an appropriate pedigree file using the `INBREEDING` keyword. See the manual for details.
 83 | 
 84 | The inverse of marker genetic matrix ($\mathbf{G}_{v}^{-1}$; p.164) is prepared as the following text file.
 85 | 
 86 | ~~~~~{language=text caption="userinverse_mr10a.txt"}
 87 |   1  1  5.556
 88 |   1  2  1.000
 89 |   1  5 -5.000
 90 | ...
 91 |   8  9 -0.663
 92 |   9  9  6.630
 93 |  10 10  5.556
 94 | ~~~~~
 95 | 
 96 | The file contains 3 columns: row index, column index and the value (or column index, row index, and the value). BLUPF90 supports a symmetric matrix as a user-supplied inverse of relationship matrix, so the program needs only upper- or lower-diagonal elements as well as the diagonal elements. The above file contains elements in and above the diagonal. Note that elements with a value of 0 do not need to be provided, which is handy when the matrix is sparse.
 97 | 
 98 | The parameter file should be as follows.
 99 | 
100 | ~~~~~{language=blupf90 caption="param_mr10a.txt"}
101 | DATAFILE
102 | data_mr10a.txt
103 | NUMBER_OF_TRAITS
104 | 1
105 | NUMBER_OF_EFFECTS
106 | 4
107 | OBSERVATION(S)
108 | 7
109 | WEIGHT(S)
110 | 
111 | EFFECTS:
112 | 2  2 cross    # fixed effect
113 | 1  5 cross    # additive genetic effect
114 | 5  0 cross    # paternal QTL effect
115 | 6 10 cross    # maternal QTL effect ( total 10 levels combined with paternal effect )
116 | RANDOM_RESIDUAL VALUES
117 | 0.60
118 | RANDOM_GROUP
119 | 2
120 | RANDOM_TYPE   # considering inbreeding
121 | add_an_upginb
122 | FILE
123 | pedigree_mr10a.txt
124 | (CO)VARIANCES
125 | 0.30
126 | RANDOM_GROUP
127 | 4
128 | RANDOM_TYPE   # reading the user-supplied file
129 | user_file
130 | FILE          # its file name
131 | userinverse_mr10a.txt
132 | (CO)VARIANCES
133 | 0.05
134 | OPTION solv_method FSPAK
135 | ~~~~~
136 | 
137 | We should be careful to describe the parameter file in terms of the following information.
138 | 
139 | - `EFFECTS`: Use of the "level 0" technique. The incidence matrix for the QTL marker effects, $\mathbf{W}$, contains two 1s per row. A trick can do this. See the social interaction model for the detailed explanation. Also, there are 10 levels combining paternal and maternal QTL effects.
140 | - `RANDOM_TYPE` for the additive genetic effect: Use of `add_an_upginb`. The pedigree file contains the 4th column which indicates the inb/upg code. With this keyword, BLUPF90 can read the column and build $\mathbf{A}^{-1}$ with inbreeding. Even if we do not put any unknown parent groups in this analysis, the keyword should be `add_an_upginb`.
141 | - `RANDOM_TYPE` for the marker effect: Use of user file to read the user-supplied file. The name of the file should also be supplied at FILE section.
142 | 
143 | 
144 | Solutions
145 | ---------
146 | 
147 | The solutions are identical to the textbook (p.166).
148 | 


--------------------------------------------------------------------------------
/vc_advanced_gs.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Variance component estimation
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Advanced features in Gibbs sampling programs
 10 | ============================================
 11 | 
 12 | Heterogeneous residual variances
 13 | --------------------------------
 14 | 
 15 | GIBBS3F90 supports heterogeneous residual variances defined by a class. Here we will demonstrate an analysis with the heterogeneity in residual variance.
 16 | 
 17 | ### Files ###
 18 | 
 19 | We use the same data and pedigree file as before.
 20 | 
 21 | - [`simdata2.txt`](https://github.com/Masuday/data/blob/master/tutorial/simdata2.txt) : data file
 22 | - [`simped2.txt`](https://github.com/Masuday/data/blob/master/tutorial/simped2.txt) : pedigree file
 23 | 
 24 | The pedigree file contains three columns: animal, sire, and dam. The data file has 12 columns as described below.
 25 | 
 26 | Column Item          type      description
 27 | ------ ---------     -------   ----------------------------------
 28 | 1      Animal ID     integer   Same as in pedigree (4641 animals)
 29 | 2      Sire ID       integer   Same as in pedigree
 30 | 3      Dam ID        integer   Same as in pedigree
 31 | 4      Weight        real      Not used here
 32 | 5      Mu            integer   All 1: not used here
 33 | 6      Farm          integer   Class effect (155 levels)
 34 | 7      Sex           integer   Class effect (2 levels)
 35 | 8      Year          integer   Class effect (11 levels)
 36 | 9      Obs. 1        real      Phenotype for trait 1
 37 | 10     Obs. 2        real      Phenotype for trait 2
 38 | 11     Obs. 3        real      Phenotype for trait 3
 39 | 12     Obs. 4        real      Phenotype for trait 4
 40 | 13     Covariate     real      Not used
 41 | 14     Class         integer   Heterogeneous residual class
 42 | 
 43 | The 14th column contains the heterogeneous-residual-variance class (3 levels) which is used in this section.
 44 | 
 45 | The parameter file has one option to define the heterogeneous residual class.
 46 | 
 47 | ~~~~~{language=blupf90 caption="gibbs2.txt"}
 48 | DATAFILE
 49 | simdata2.txt
 50 | NUMBER_OF_TRAITS
 51 | 1
 52 | NUMBER_OF_EFFECTS
 53 | 4
 54 | OBSERVATION(S)
 55 | 9
 56 | WEIGHT(S)
 57 | 4
 58 | EFFECTS:
 59 |  6   155 cross
 60 |  7     2 cross
 61 |  8    11 cross
 62 |  1  4641 cross
 63 | RANDOM_RESIDUAL VALUES
 64 | 100
 65 | RANDOM_GROUP
 66 | 4
 67 | RANDOM_TYPE
 68 | add_animal
 69 | FILE
 70 | simped2.txt
 71 | (CO)VARIANCES
 72 | 100
 73 | OPTION hetres_int 14 3
 74 | ~~~~~
 75 | 
 76 | The option has 2 arguments.
 77 | 
 78 | - `OPTION hetres_int` = defines the heterogeneous residual class with two values: 1) the position of the class in the data file and 2) the maximum level in the class.
 79 | 
 80 | In this case, we specify that the 14th column has 3 levels.
 81 | 
 82 | With 20,000 samples (saved each 10 samples) with 10,000 burn-in, the following results can be found.
 83 | 
 84 | ~~~~~{language=output}
 85 |  ave G
 86 |    39.845
 87 |  SD G
 88 | 	3.9917
 89 |  ave R
 90 | 	72.313
 91 |  SD R
 92 | 	3.9497
 93 |  ave R
 94 |    78.248
 95 |  SD R
 96 |    4.4558
 97 |  ave R
 98 |   80.646
 99 |  SD R
100 |    4.3320
101 | ~~~~~
102 | 
103 | There are 3 `ave_R` (and `SD_R`) blocks. The first one corresponds to the variance in heterogeneous level 1, the second line for level 2 and so on. Compare the above estimates to the results from AIREMLF90 shown in the previous section.
104 | 
105 | 
106 | Restart the sampling
107 | --------------------
108 | 
109 | After the sampling, you would find more samples are needed. The Gibbs sampling programs support to restart the sampling from the end point of the previous run. Any Gibbs samplers in recent BLUPF90 programs support this feature.
110 | 
111 | When the Gibbs sampler finishes sampling, 4 files will be created. To continue the sampling, all the files are required.
112 | 
113 | - `binary_final_solutions` = Posterior means and SDs for location parameters
114 | - `last_solutions` = The last samples for location parameters
115 | - `fort.99` = Values needed for DIC
116 | - `gibbs_samples` = Sampled variance components
117 | 
118 | If you want to restart the sampling, you have to add an option to the parameter file. It needs the number of samples drawn in the previous run. If you had 10000 samples, the option should be
119 | 
120 | ~~~~~{language=blupf90}
121 | OPTION cont 10000
122 | ~~~~~
123 | 
124 | where 10000 is the number of samples obtained previously. Run the Gibbs sampler with a parameter file with the above option, and the program restarts the sampling (from 10001 in the example).
125 | 
126 | 
127 | Extraction of solutions from binary final solutions
128 | ---------------------------------------------------
129 | 
130 | The file `binary_final_solutions` contains the posterior mean of a location parameter (that is a solution of "fixed" or "random" effect). This file is saved as a non-text format. The following Fortran program can extract the solutions and print them to a file `final_solutions.txt`. You can compile the program using a Fortran compiler (like GFortran) and run it in a directory where the binary final solutions are. The output format is equivalent to BLUPF90 with `OPTION sol_se`.
131 | 
132 | ~~~~~{language=Fortran caption="\url{binsol_to_textsol.f90}"}
133 | program binsol_to_textsol
134 | 	implicit none
135 | 	integer :: io, t, e, l
136 | 	double precision :: v, sol, se
137 | 	open(10, file='binary_final_solutions', form='unformatted', &
138 | 		status='old', iostat=io)
139 | 	if(io /= 0) stop
140 | 	open(20, file='final_solutions.txt')
141 | 	write(20,'(" trait / effect level solution               s.e.")')
142 | 	do
143 | 		read(10, iostat=io) t,e,l,sol,se
144 | 		if(io /= 0) exit
145 | 		write(20, '(2i4,i10,2f20.8)') t,e,l,sol,se
146 | 	end do
147 | 	close(10)
148 | 	close(20)
149 | end program binsol_to_textsol
150 | ~~~~~
151 | 
152 | Output of POSTGIBBSF90
153 | ----------------------
154 | 
155 | The POSTGIBBSF90 program shows many diagnoses for the Gibbs samples. Here we just show the basic ideas for the information.
156 | 
157 | - `MCE` = Monte Carlo error, corresponding to the ''standard error'' of the posterior mean of a parameter ($\hat{\mu}-\mu$).
158 | - `Mean` = Posterior mean of a parameter.
159 | - `HPD` = High probability density within 95%, close idea to ''95% confidence interval'' in frequentist approach.
160 | - `Effective sample size` = Number of samples after deducting auto-correlation among samples.
161 | - `Median` = Posterior median of a parameter.
162 | - `Mode` = Posterior mode of a parameter; just an approximation.
163 | - `Independent chain size`
164 | - `PSD` = Posterior standard deviation of a parameter.
165 | - `Mean` = the same as those used above.
166 | - `PSD Interval (95%)` = Lower and upper bounds of Mean $\pm$ 1.96PSD.
167 | - Geweke diagnostic = Convergence diagnosis; could be converged if this is $< 1.0$ (according to the official manual, this is almost useless because this is $< 1.0$ in almost all cases).
168 | - `Autocorrelations` = Lag-correlations with lag 1, 10 and 50; calculated for the saved samples.
169 | - `Independent # batches` = The effective number of blocks after deducting the auto-correlation among samples.
170 | 


--------------------------------------------------------------------------------
/genomic_files.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Practical genomic analysis
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Files used in genomic analysis
 10 | ==============================
 11 | 
 12 | As seen in the tutorial in the previous chapter, PREGSF90 needs two files to handle genomic data: an SNP marker file and a cross-reference file. In this section, we describe the detailed format for these two files. Also, we will present some other files optionally needed in quality control of markers.
 13 | 
 14 | PREGSF90 reads the same parameter file as BLUPF90. From this file, the program reads the names of the pedigree file, the marker file, (optionally) the cross-reference file, and some options. The rest of the parameter file  (for example, the name of the data file and the model) is effectively ignored.
 15 | 
 16 | 
 17 | SNP file
 18 | --------
 19 | 
 20 | The PREGSF90 program can accept 2 kinds of SNP files. One contains only integer numbers as genotypes (for example from SNP chips) and the other one contains real numbers as gene content (possibly from imputation software, from Genotyping by Sequence or from sequencing at low depth). First, we consider the former one; we here call it an SNP-marker file.
 21 | 
 22 | ### SNP marker file ###
 23 | 
 24 | An SNP-marker file is a text file which contains 2 fields: animal ID (possibly alphanumeric) in the 1st field and its genotypes in the 2nd field. We show the first and the last 5 lines from an example data presented in the previous chapter again (limited to the first 30 characters in each line to save the space).
 25 | 
 26 |      8003   211011112112012000211002
 27 |      8007   212011111111012011111012
 28 |      8016   200120111021012111102121
 29 |      8019   112211021121202111011210
 30 |      8020   211110101120002021002122
 31 |        (skip)
 32 |     13496   101001212012010210212201
 33 |     13497   101112111021020120222111
 34 |     13498   200000220202202000022222
 35 |     13499   011011111111020111121112
 36 |     13500   101020112122021000221001
 37 | 
 38 | The format was already described in the previous chapter.
 39 | 
 40 | 
 41 | ### Gene content file ###
 42 | 
 43 | The PREGSF90 program accepts another format. Here, we call it a _gene content file_. The gene content file may have a number of fields with the following rules.
 44 | 
 45 | - The Animal ID must be in the 1st field and the gene content on each locus must be in the 2nd or later fields.
 46 | - Adjacent gene content can be separated by white spaces. No spaces are also allowed.
 47 | - No headers, no comments, no other fields are allowed. The data must start on the first line.
 48 | - In animal IDs, alphabets, numbers, and symbols (possibly ASCII) are acceptable. The link to the renumbered pedigree is provided in the cross-reference file below.
 49 | - Gene content should contain an integer, a floating-point (for example `3.14`), or exponential expression (for example `0.314E+01`) as a real number. All markers must have the same format.
 50 | - No missing gene content is allowed and all animals must have the same number of gene content. The missing gene contents should be imputed before the analyses.
 51 | - The minimum number of gene content is 50. The fewer number of markers is not acceptable.
 52 | 
 53 | 
 54 | Cross-reference (XrefID) file
 55 | -----------------------------
 56 | 
 57 | The BLUPF90 programs need a cross-reference file. This file is automatically generated with RENUMF90. This file relates a renumbered ID to the original ID for genotyped animals. Again, we show the first 5 and the last 5 lines of the actual cross-reference file presented in the previous chapter.
 58 | 
 59 |     6127 8003
 60 |     13570 8007
 61 |     406 8016
 62 |     10802 8019
 63 |     10924 8020
 64 |       (skip)
 65 |     8585 13496
 66 |     8941 13497
 67 |     9369 13498
 68 |     9753 13499
 69 |     9905 13500
 70 | 
 71 | 
 72 | This file simply contains 2 fields: the first is for renumbered ID (same as in the pedigree file) and the second is for the original ID (same as in the marker file). The 2 fields can be separated with at least 1 space. A tab is not allowed. The order of animals must be the same as the SNP file. Again, this file is generated with RENUMF90, and the user should not edit it unless there is a reason to do it.
 73 | 
 74 | 
 75 | Allele frequency file (optional)
 76 | --------------------------------
 77 | 
 78 | An allele frequency file contains the actual allele frequency on each marker. This file is optional because the allele frequency is calculated with the current SNP file by default. Only if you expect to use external information for it, you can provide a file containing the allele frequencies. Different allele frequency may change characteristics of $\mathbf{G}$.
 79 | 
 80 | Here is an example file. This contains the allele frequency only for the first 5 markers.
 81 | 
 82 |     1     0.711667
 83 |     2     0.328000
 84 |     3     0.422000
 85 |     4     0.157000
 86 |     5     0.492333
 87 | 
 88 | This file contains 2 columns:
 89 | 
 90 | 1. The position of the marker (from 1 to the maximum number of markers)
 91 | 2. Allele frequency as a real value (ranged from 0 to 1).
 92 | 
 93 | Note that PREGSF90 usually creates this file containing the allele frequency calculated from the current SNP file by default (the file name is `freqdata.count`).
 94 | 
 95 | 
 96 | Map file (optional)
 97 | -------------------
 98 | 
 99 | Map file relates a marker to a chromosome, a physical location, and a specific name. This file is needed only when you try comprehensive quality control and GWAS with ssGBLUP. This file should contain at least 3 fields separated at least 1 white space. The 4th column is optional and it contains the name of a marker. Here we show an example (including the 4th field in this case). Only the first 3 lines are shown here.
100 | 
101 |     1  1     127     SNP-CODE-1
102 |     2  1     652     SNP-CODE-2
103 |     3  1    1022     SNP-CODE-3
104 | 
105 | This file follows the rules.
106 | 
107 | - The first 3 fields should contain integer values: the first is a marker number, the second specifies the chromosome number, and the third represents the physical location on the chromosome. The marker number is just an integer, not necessarily correlative, used for external data manipulation - it is not actually used by the program.
108 | - The sex chromosome (X) can be present, but it should be also an integer value. The code of the X chromosome is 0 by default.
109 | - The 4th column (optional) can contain any alphabets, numbers, and symbols (possibly in ASCII) up to 50 characters.
110 | 
111 | 
112 | Weight for SNP (optional)
113 | -------------------------
114 | 
115 | The PREGSF90 program can create an alternative $\mathbf{G}$ weighting on each SNP marker. This weighted matrix is especially useful for GWAS or related analyses with the POSTGSF90 program. If you supply this file, the weighted $\mathbf{G}$ will be calculated. Otherwise, weights are set to 1 that is no specific weight on each SNP marker. This file is optional.
116 | 
117 | The file contains only 1 column with real values. Each row corresponds to each SNP marker; the first line contains a weight for the marker 1, and the second row contains a weight for the marker 2, and so on. The number of lines in this file shouldn't exceed the number of markers.
118 | 


--------------------------------------------------------------------------------
/mrode_c12ex123_dominance_inverse.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Numerical examples from Mrode (2014)
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Inversion of the dominance matrix
 10 | =================================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | A rapid algorithm to calculate $\mathbf{D}^{-1}$ (ignoring inbreeding) was developed by Hoeschele and VanRaden (1991). Their method was derived from the parental subclass effects $\mathbf{f}$ and its covariance matrix $\mathbf{F}$. The dominance effect is an interaction of genes, and the effect is inherited through the combination of parents and ancestors rather than through individuals. Therefore, the dominance variation can be described with the combination of ancestral animals. Here we just introduce the idea of parental subclasses. See the original paper or the textbook for the detailed algorithm for the inverse.
 16 | 
 17 | 
 18 | For an animal $i$, its dominance effect $d_i$ can be expressed as
 19 | $$
 20 | d_i = f_{S,D} + \varepsilon
 21 | $$
 22 | where $f_{S,D}$ is the expected dominance effect of many hypothetical fill-sibs from the sire $S$ and the dam $D$, and $\varepsilon$ is the Mendelian sampling deviation. We assume $\mathrm{var}( f_{S,D} ) = \sigma_f^2$ and it is equivalent to the dominance covariance among the full-sibs. So we can find $\sigma_f^2=0.25\sigma_d^2$ and $\mathrm{var}(\varepsilon)  = 0.75\sigma_d^2$. The parental dominance effect actually consists of several components:
 23 | $$
 24 | f_{S,D} = 0.50( f_{S, SD} + f_{S, DD} + f_{SS, D} + f_{DS, D}) - 0.25( f_{SS, SD} + f_{SS, DD} + f_{DS, SD} + f_{DS, DD}) + e
 25 | $$
 26 | where $SS$ and $SD$ refer to the sire and the dam of the sire, respectively, and $SD$ and $DD$ refer to the sire and the dam of the dam, respectively. We have to put a label on each combination of animals (subclass). A dominance pedigree file contains the labels on the above 8 subclass for each $f_{S,D}$. The vector $\mathbf{f}$ contains the parental dominance effects and its variance is
 27 | $$
 28 | \mathrm{var}(\mathbf{f}) = \mathbf{F}\sigma_f^2
 29 | $$
 30 | and its inverse is rapidly calculated using the dominance pedigree file. The final inverse $\mathbf{D}^{-1}$ is also calculated with $\mathbf{F}^{-1}$.
 31 | 
 32 | BLUPF90 calculates $\mathbf{F}^{-1}$ only. This matrix itself cannot be used to estimate individual dominance effect. However, $\mathbf{F}^{-1}$ is still useful to estimate the variance components. In this section, we will prepare the files as usual even we will not solve the mixed model equations.
 33 | 
 34 | 
 35 | Files
 36 | -----
 37 | 
 38 | <!--
 39 | First we prepare the dominance pedigree file. According to the textbook (p.213), there are 6 known
 40 | subclasses for parental dominance ($f_{S,D}$) to be considered in $\mathbf{F}$. The following table consists of the
 41 | selected elements of the table.
 42 | 
 43 |  Sire ($S$)     Dam ($D$)    $\phi$     Known parent subclasses
 44 | ------------   -----------  --------    -----------------------
 45 |     6             8           1         2, 3, 6
 46 |     6             5           2         3, 6
 47 |     3             8           3         6
 48 |     3             4           4
 49 |     1             2           5
 50 |     3             5           6
 51 | 
 52 | The subclass is referred with the label ($\phi$). For example, $f_{6,8}$ is the subclass 1, $f_{6,5}$ is the subclass
 53 | 2 and so on. The known parent subclasses are the components of the parental dominance. For
 54 | example, in the subclass 1 with $S = 6$ and $D = 8$, $f_{6,8}$ consists of the subclasses 2, 3 and 6. In this
 55 | case, the subclass 2 is $f_{6,5}$ that is $fS$, $DD$ for $f_{6,8}$. Similarly, the subclass 3 is $f_{3,8}$ that is $f_{SS,D}$ for $f_{6,8}$ and
 56 | the subclass 6 is $f_{3,5}$ that is $f_{SS,DD}$ for $f_{6,8}$. Based on the above equation, the subclass 1 can be
 57 | described as
 58 | $$
 59 | f_{6,8} = 0.50( f_{S,DD} + f_{SS,D} ) - 0.25 f_{SS,DD}
 60 | $$
 61 | and missing components are ignored because of no contribution for the results.
 62 | -->
 63 | 
 64 | The dominance pedigree file for BLUPF90 contains 10 columns.
 65 | 
 66 | 1. Parental dominance subclass ($\phi$).
 67 | 2. Subclass $S, SD$ (code:1)
 68 | 3. Subclass $S, DD$ (code:2)
 69 | 4. Subclass $SS, D$ (code:4)
 70 | 5. Subclass $DS, D$ (code:8)
 71 | 6. Subclass $SS, SD$ (code:16)
 72 | 7. Subclass $SS, DD$ (code:32)
 73 | 8. Subclass $DS, SD$ (code:64)
 74 | 9. Subclass $DS, DD$ (code:128)
 75 | 10. Sum of the code for nonempty subclasses.
 76 | 
 77 | If the subclass is unknown or empty, put 0. The last column contains an integer value from the summation of all the code number for nonempty subclasses. The final code should be ranging from 0 to 255. For example, in $\phi = 1$ that is, $f_{6,8}$, the final code is $2 + 4 + 32 = 38$. If the final code is 0, you can omit such a line.
 78 | 
 79 | The dominance pedigree file for this example is shown below.
 80 | 
 81 | ~~~~~{language=text caption="dominance_mr12b.txt"}
 82 | 1 0 2 3 0 0 6 0 0  38
 83 | 2 0 0 6 0 0 0 0 0   4
 84 | 3 0 6 0 0 0 0 0 0   2
 85 | ~~~~~
 86 | 
 87 | The data file has to contain the parental subclass labels. The 4th column is the subclass label.
 88 | 
 89 | ~~~~~{language=text caption="data_mr12b.txt"}
 90 |  5  2  17    5
 91 | ...
 92 | ~~~~~
 93 | 
 94 | The pedigree file is the same as the previous one (`pedigree_mr12b.txt`).
 95 | 
 96 | The parameter file is shown below.
 97 | 
 98 | ~~~~~{language=blupf90 caption="param_mr12b.txt"}
 99 | DATAFILE
100 | data_mr12b.txt
101 | NUMBER_OF_TRAITS
102 | 1
103 | NUMBER_OF_EFFECTS
104 | 3
105 | OBSERVATION(S)
106 | 3
107 | WEIGHT(S)
108 | 
109 | EFFECTS:
110 | 2  2 cross   # fixed effect
111 | 1 12 cross   # additive genetic
112 | 4  6 cross   # parental dominance
113 | RANDOM_RESIDUAL VALUES
114 | 180.0
115 | RANDOM_GROUP
116 | 2
117 | RANDOM_TYPE
118 | add_animal
119 | FILE
120 | pedigree_mr12b.txt
121 | (CO)VARIANCES
122 | 90.0
123 | RANDOM_GROUP
124 | 3
125 | RANDOM_TYPE
126 | par_domin
127 | FILE
128 | dominance_mr12b.txt
129 | (CO)VARIANCES
130 | 20.0
131 | OPTION solv_method FSPAK
132 | ~~~~~
133 | 
134 | With the keyword `par_domin`, BLUPF90 creates $\mathbf{F}^{-1}$ from the dominance pedigree file. Note that the parental subclass variance is one quarter of the dominance variance ($\sigma_f^2=\sigma_d^2/4 = 80/4 = 20$). The remaining variance goes into the residual variance ($120 + 60 = 180$).
135 | 
136 | You can run BLUPF90 with this parameter file. The solutions are similar in BLUE and BLUP for the additive effects compared with the previous results. The predictions for parental dominance are not equivalent to the previous results. In this analysis, we just consider the quarter of dominance variance through $\mathbf{F}^{-1}$. For the precise prediction of the dominance effects, a user should use the software that fully supports the dominance effect. As we mentioned before, however, the parameter file is still useful for variance component estimation with a dominance model.
137 | 
138 | What is the most efficient way to create a dominance pedigree file? A renumbering program RENDOMN supports generating the file. This program is based on the old design and different usage compared to RENUMF90. A solver supporting a dominance model is JAADOMN which is also old software. You can find the programs at <http://nce.ads.uga.edu/~ignacy/numpub/dominance/>.
139 | 


--------------------------------------------------------------------------------
/largescale_reml.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Large-scale genetic evaluation
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | REML estimation with large data
 10 | ===============================
 11 | 
 12 | Background
 13 | ----------
 14 | 
 15 | In typical REML computations, there are several bottlenecks. First, the exact solutions of the mixed model equations with the current variance components are needed. This means we should use Gaussian elimination or similar methods (direct methods). The Cholesky factorization is often employed to solve the equations. The LHS of mixed model equations generally contains many zero elements, and the matrix is sparse. Although we can hold only the nonzero elements to save the memory, very complicated operations are needed for the factorization.
 16 | 
 17 | Secondly, the REML algorithms use the first derivative of the restricted log-likelihood, which contains the selected elements of the inverse of LHS. In the animal model, the derivative contains the following trace term:
 18 | $$
 19 | \mathrm{tr}(\mathbf{A}^{-1}\mathbf{C}^{uu})
 20 | $$
 21 | where $\mathbf{A}^{-1}$ is the inverse of the numerator relationship matrix and $\mathbf{C}^{uu}$ is the submatrix of the inverse of the left-hand side (LHS) of the mixed model equations corresponding to the solutions for the additive genetic effect ($\hat{\mathbf{u}}$). Simply speaking, the REML equations require the inverse of the LHS. The inverse of a sparse matrix is usually dense (non-sparse). You can easily imagine that its computational and storage costs are extremely high.
 22 | 
 23 | In this calculation, we do not actually need the whole inverse. We just need the selected elements of the inverse corresponding to non-zero elements in the original $\mathbf{C}$. The selected subset is called sparse inverse in animal breeding literature. An algorithm, so-called Takahashi algorithm, can calculate the sparse inverse. This algorithm updates the Cholesky factor of the LHS.
 24 | 
 25 | FSPAK is a successful computer package to perform sparse operations including the factorization and sparse inversion for the LHS of mixed model equations. This package is the default solver in REMLF90 and AIREMLF90, now BLUPF90+, (and BLUPF90/BLUPF90+ with `OPTION solv_method FSPAK`). It is still useful for small equations although the back-end subroutines were written more than 20 years ago. Unfortunately, the old design is inefficient in larger equations with many dense blocks. For example, a multiple-trait model (or random regression/maternal model) contains many (genetic and residual) covariance matrices in the equations. Although each covariance matrix is small, the small matrices are combined into large dense blocks during the sparse operations. This is more typical of a genomic model including the (inverse of) genomic relationship matrix, that is large and dense.
 26 | 
 27 | YAMS is a replacement of FSPAK. This package implements several advanced algorithms, including the supernodal factorization and the inverse multifrontal approach, to efficiently handle the dense blocks in the sparse matrix. YAMS intensively uses BLAS and LAPACK. For technical details, see Masuda et al. (2014) and Masuda et al. (2015).
 28 | 
 29 | AIREMLF90 (now BLUPF90+) especially implements several options to accelerate the computation and stabilize the estimation process. Here we also introduce the options.
 30 | 
 31 | 
 32 | Files
 33 | -----
 34 | 
 35 | We will use the following files.
 36 | 
 37 | - [`simdata_rep.txt`](https://github.com/Masuday/data/blob/master/tutorial/simdata_rep.txt) : data file
 38 | - [`simped.txt`](https://github.com/Masuday/data/blob/master/tutorial/simped.txt) : pedigree file
 39 | 
 40 | We will use a 4-trait repeatability model with the parameter file.
 41 | 
 42 | ~~~~~{language=blupf90 caption="complicatedparam1.txt"}
 43 | DATAFILE
 44 | simdata_rep.txt
 45 | NUMBER_OF_TRAITS
 46 | 4
 47 | NUMBER_OF_EFFECTS
 48 | 5
 49 | OBSERVATION(S)
 50 | 9 10 11 12
 51 | WEIGHT(S)
 52 | 
 53 | EFFECTS: POSITIONS_INDATAFILE NUMBER_OF_LEVELS TYPE_OF_EFFECT  [ EFFECT NESTED ]
 54 | 6 6 6 6   155 cross
 55 | 7 7 7 7     2 cross
 56 | 8 8 8 8    11 cross
 57 | 1 1 1 1  4641 cross
 58 | 1 1 1 1  4641 cross
 59 | RANDOM_RESIDUAL VALUES
 60 | 100 80 80 80
 61 | 80 100 80 80
 62 | 80 80 100 80
 63 | 80 80 80 100
 64 | RANDOM_GROUP
 65 | 4
 66 | RANDOM_TYPE
 67 | add_animal
 68 | FILE
 69 | simped.txt
 70 | (CO)VARIANCES
 71 | 100 80 80 80
 72 | 80 100 80 80
 73 | 80 80 100 80
 74 | 80 80 80 100
 75 | RANDOM_GROUP
 76 | 5
 77 | RANDOM_TYPE
 78 | diagonal
 79 | FILE
 80 | 
 81 | (CO)VARIANCES
 82 | 100 80 80 80
 83 | 80 100 80 80
 84 | 80 80 100 80
 85 | 80 80 80 100
 86 | OPTION use_yams
 87 | OPTION EM-REML 10
 88 | ~~~~~
 89 | 
 90 | Here we put 2 options.
 91 | 
 92 | ~~~~~{language=blupf90}
 93 | OPTION use_yams
 94 | ~~~~~
 95 | 
 96 | This option switches the sparse library from FSPAK to YAMS. If the program has multi-threaded BLAS and LAPACK, the computations will be parallelized. This option is also effective in BLUPF90 in a combination with `OPTION solv_method FSPAK` --- it will use YAMS instead of FSPAK.
 97 | 
 98 | ~~~~~{language=blupf90}
 99 | OPTION EM-REML n   # default :0
100 | ~~~~~
101 | 
102 | This option forces the program performs EM REML (equivalent to REMLF90) in the first n rounds. The default is 0 that is no EM rounds are performed and AI rounds start from round 1. For complicated models, AI REML often diverges in the first round. Although this option can give a remedy for this situation, it does not always prevent the program from diverging.
103 | 
104 | We have several more options to accelerate the computations.
105 | 
106 | ~~~~~{language=blupf90}
107 | OPTION fact_once x
108 | ~~~~~
109 | 
110 | This option avoids the re-calculation of the Cholesky factor. It saves the Cholesky factor in temporary memory (if `x` is `memory`) or a temporary file (if `x` is `file`). If you have enough memory, `memory` is preferable because of its faster computations.
111 | 
112 | ~~~~~{language=blupf90}
113 | OPTION approx_loglike
114 | ~~~~~
115 | 
116 | This skips the computation of the exact log-likelihood. If you do not need the exact value, we recommend using this option for speed-up.
117 | 
118 | 
119 | Results
120 | -------
121 | 
122 | The following results will be available in 17 rounds.
123 | 
124 | ~~~~~{language=output}
125 |  new R
126 |    45.124       22.357       18.626       13.762
127 |    22.357       44.210       22.690       18.016
128 |    18.626       22.690       46.101       22.795
129 |    13.762       18.016       22.795       45.274
130 |  new G
131 |    41.967       22.512       24.058       26.907
132 |    22.512       17.489       19.738       24.257
133 |    24.058       19.738       28.775       34.741
134 |    26.907       24.257       34.741       56.668
135 |  new G
136 |    15.546       15.237       10.490       1.4105
137 |    15.237       40.937       19.562       6.4840
138 |    10.490       19.562       27.782       4.5379
139 |    1.4105       6.4840       4.5379       2.4410
140 | ~~~~~
141 | 
142 | You can try alternative parameter file without `use_yams` and `EM-REML`. Without YAMS, the computations will be slow. Without EM-REML, the estimates diverge in the 1st round. You can also try `fact_once` and `approx_loglike`. Although you will not be sure that the options accelerate the computations in this small example, it surely reduces the computing time for a larger analysis.
143 | 


--------------------------------------------------------------------------------
/renum_genomic.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Data preparation with RENUMF90
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Genomic model with SNP-marker file
 10 | ==================================
 11 | 
 12 | Required files
 13 | --------------
 14 | 
 15 | Single-step GBLUP (ssGBLUP) model is an extension to the animal model. RENUMF90 can handle ssGBLUP using a similar parameter file to the animal model with an additional option. Let us use the previous files. The following files are the same as those used the previous ones with different names.
 16 | 
 17 | ~~~~~{language=text caption="rawdata4.txt"}
 18 |   ID006  A  1  1.0  3.0
 19 |   ID009  A  2  1.0  2.0
 20 |   ID012  A  1  2.0  4.0
 21 |   ID007  B  2  2.0  6.0
 22 |   ID010  B  1  1.0  3.0
 23 |   ID013  B  2  2.0  6.0
 24 |   ID008  C  1  2.0  6.0
 25 |   ID011  C  2  1.0  6.0
 26 |   ID014  C  1  1.0  8.0
 27 |   ID015  C  2  2.0  4.0
 28 | ~~~~~
 29 | 
 30 | The pedigree file is as follows.
 31 | 
 32 | ~~~~~{language=text caption="rawpedigree4.txt"}
 33 |  ID001      0      0
 34 |  ID002      0      0
 35 |  ID003      0      0
 36 |  ID004      0      0
 37 |  ID005      0      0
 38 |  ID006      0      0
 39 |  ID007  ID002  ID005
 40 |  ID008  ID001  ID004
 41 |  ID009  ID002  ID003
 42 |  ID010  ID007  ID006
 43 |  ID011  ID007  ID004
 44 |  ID012  ID011  ID008
 45 |  ID013  ID011  ID010
 46 |  ID014  ID009  ID013
 47 |  ID015  ID011  ID010
 48 | ~~~~~
 49 | 
 50 | An SNP-marker file should be prepared as a fixed-length text file with 2 columns: the original ID for column 1 and genotypes for column 2. See the previous chapter for the detailed explanation about the marker-file format. We use the same file shown before.
 51 | 
 52 | ~~~~~{language=widetext caption="snp4.txt"}
 53 | ID002 1212020111001211212000102001202022201112000110020102011001012212110122100001210101112200210010022220
 54 | ID004 2222020200110122210121001100022221102111000010011202020002002222000222000020120000020200200000200220
 55 | ID006 2122020101100221121111212100112111112212000101012211111001201211112022100201220110022200220020111120
 56 | ID010 2111121101111210121211112210202021211211000110021211111001101211101122100111221000022200120020101120
 57 | ID011 2211121200121111220211011210112121111201000120011202020002002222000222000021221000021200110010101220
 58 | ID012 2222020200110122210121001100022221102111000010011202020002002222000222000020120000020200200000200220
 59 | ID013 2122020101210121121222101200112122102212000010012211111001101211101122100110220000021200210010200120
 60 | ID015 2122020101210121121222101200112122102212000010012211111001101211101122100110220000021200210010200120
 61 | ~~~~~
 62 | 
 63 | The marker file is never altered and duplicated by RENUMF90. The program just read the 1st column (the original ID) and make a table relating the renumbered animal ID to the original animal ID. The table will be saved as a cross-reference file.
 64 | 
 65 | We assume the same animal model described before except that the genomic information is included. The following is a parameter file to handle the SNP file.
 66 | 
 67 | ~~~~~{language=renumf90 caption="renum4.txt"}
 68 | DATAFILE
 69 | rawdata4.txt
 70 | TRAITS
 71 | 5
 72 | FIELDS_PASSED TO OUTPUT
 73 | 
 74 | WEIGHT(S)
 75 | 
 76 | RESIDUAL_VARIANCE
 77 | 2.0
 78 | EFFECT           # 1st effect fixed
 79 | 2 cross alpha
 80 | EFFECT           # 2nd effect fixed
 81 | 3 cross alpha
 82 | EFFECT           # 3rd effect fixed
 83 | 4 cov
 84 | EFFECT           # 4th effect
 85 | 1 cross alpha
 86 | RANDOM           ## treated as a random effect
 87 | animal
 88 | FILE             ## pedigree file
 89 | rawpedigree4.txt
 90 | FILE_POS         ## animal, sire and dam IDs with two 0s
 91 | 1 2 3 0 0
 92 | SNP_FILE         ## SNP marker file
 93 | snp4.txt
 94 | (CO)VARIANCES    ## its variance component
 95 | 0.5
 96 | ~~~~~
 97 | 
 98 | A new keyword `SNP_FILE` with the name of marker file tells RENUMF90 to properly treat the marker file. This keyword should be placed just after `FILE_POS`.
 99 | 
100 | 
101 | Renumbered files
102 | ----------------
103 | 
104 | Running RENUMF90 with the above instruction, it generates several files in the same folder (directory). With this example, RENUMF90 will generate 5 files: `renf90.dat`, `renf90.par`, `renf90.tables`, `renadd04.ped`, and `snp4.txt_XrefID`. The last one is the cross-reference file (or simply, the XrefID file), which contains 2 columns for the renumbered ID and the original ID as follows.
105 | 
106 | ~~~~~{language=text}
107 | 11 ID002
108 | 12 ID004
109 | 2 ID006
110 | 5 ID010
111 | 7 ID011
112 | 8 ID012
113 | 9 ID013
114 | 1 ID015
115 | ~~~~~
116 | 
117 | The order of genotyped animal is the same as those used the marker file. The name of this file is automatically determined and fixed as the original SNP file-name plus `_XrefID`, for example, `snp.txt_XrefID` for a marker file `snp.txt`. You cannot change the file name.
118 | 
119 | The renumbered parameter file `renf90.par` looks similar as before except the option found in the last line.
120 | 
121 | ~~~~~{language=blupf90}
122 | OPTION SNP_file snp4.txt
123 | ~~~~~
124 | 
125 | The parameter file doesn't refer to the cross-reference file because, by default, BLUPF90 programs read the standard XrefID file (SNP file name + `XrefID`). Usually, you do not have to change the name of cross-reference file, so you just keep this option line. If you do rename the XrefID file, you have to add the XrefID file name to the option-line by hand (as shown in the quick tour).
126 | 
127 | The renumbered pedigree is different from the previous one.
128 | 
129 | ~~~~~{language=text}
130 | 1 7 5 1 0 12 1 0 0 ID015
131 | 12 0 0 3 0 10 0 0 2 ID004
132 | 13 0 0 3 0 0 0 0 1 ID005
133 | 2 0 0 3 0 10 1 0 1 ID006
134 | 3 11 13 1 0 2 1 2 0 ID007
135 | 4 14 12 1 0 2 1 0 1 ID008
136 | 5 3 2 1 0 12 1 0 2 ID010
137 | 6 11 15 1 0 2 1 1 0 ID009
138 | 7 3 12 1 0 12 1 3 0 ID011
139 | 8 7 4 1 0 12 1 0 0 ID012
140 | 14 0 0 3 0 0 0 1 0 ID001
141 | 9 7 5 1 0 12 1 0 1 ID013
142 | 11 0 0 3 0 10 0 2 0 ID002
143 | 10 6 9 1 0 2 1 0 0 ID014
144 | 15 0 0 3 0 0 0 0 1 ID003
145 | ~~~~~
146 | 
147 | With genomics, RENUMF90 assigns new integer values to animals with the following rules.
148 | 
149 | 1. First, the program assigns the smallest numbers to animals with the record(s). The order of assigned numbers will be random (that is, not following the order found in the data file).
150 | 2. Secondly, the program assigns the larger numbers to genotyped animals. The order will be random (that is, not following the order found in the marker file).
151 | 3. Lastly, the program assigns the larger numbers to animals found only in pedigree. The order will be random (that is, not be sorted in any way).
152 | 
153 | You can find the genotyped animals in the renumbered pedigree file. When an animal is genotyped, the 6th column will be 10 or larger.
154 | 
155 | 
156 | Summary
157 | -------
158 | 
159 | - RENUMF90 supports ssGBLUP.
160 | - The instruction for ssGBLUP is the same as those used the animal model except for additional `SNP_FILE` keyword and the name of marker file.
161 | - RENUMF90 doesn't change the SNP marker file. Instead, it creates a cross-reference file relating the original ID to the renumbered ID. The name of the cross-reference file ends with `_XrefID` by default.
162 | - A suggested `renf90.par` contains the options line with SNP file.
163 | - The order of animals in the pedigree is determined as phenotyped animals the first, genotyped animals the second, and the other animals the last.
164 | 


--------------------------------------------------------------------------------
/renum_mt.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Data preparation with RENUMF90
  3 | author: Yutaka Masuda
  4 | date: April 2025
  5 | subject: "Introduction to BLUPF90 suite programs"
  6 | tags: [introduction,tutorial]
  7 | ...
  8 | 
  9 | Multiple-trait models
 10 | =====================
 11 | 
 12 | Model
 13 | -----
 14 | 
 15 | For renumbering on a multiple-trait model, we still need a very similar instruction file to the single-trait model. Only differences include 1) a need to specify the model for each trait and 2) use of covariance matrix instead of a scalar value.
 16 | 
 17 | We will consider a two-trait model with equal design matrices, which means all traits have the same model. The mathematical model is
 18 | $$
 19 | \begin{aligned}
 20 | y_{ijk:1} &= A_{i:1} + S_{j:1} + \beta_{1} x_{ijk:1} + u_{k:1} + e_{ijk:1}\\
 21 | y_{ijk:2} &= A_{i:2} + S_{j:2} + \beta_{2} x_{ijk:2} + u_{k:2} + e_{ijk:2}
 22 | \end{aligned}
 23 | $$
 24 | where $y_{ijk:t}$ is a observation for trait $t$, $A_{i:t}$ is the fixed effect for trait $t$, $S_{i:t}$ is also the fixed effect for trait $t$, $\beta_{t}$ is the fixed regression coefficient for trait $t$, $x_{ijk:t}$ is a covariate for trait $t$, $u_{k:t}$ is the additive genetic effect for trait $t$, and $e_{ijk:t}$ is the random residual effect. The genetic ($\mathbf{G}_0$) and residual ($\mathbf{R}_0$) covariance matrices are
 25 | $$
 26 | \mathbf{G}_{0}
 27 | =
 28 | \left[
 29 | \begin{array}{rr}
 30 | 0.50&-0.25\\
 31 | -0.25&1.00
 32 | \end{array}
 33 | \right]
 34 | \quad
 35 | \text{and}
 36 | \quad
 37 | \mathbf{R}_{0}
 38 | =
 39 | \left[
 40 | \begin{array}{rr}
 41 | 2.0&1.0\\
 42 | 1.0&1.5
 43 | \end{array}
 44 | \right]
 45 | $$
 46 | 
 47 | 
 48 | Required files
 49 | --------------
 50 | 
 51 | The raw data file is identical to the previous single-trait data except for the new column (trait 2) added. The data file should contain all required effects and observations for all traits.
 52 | 
 53 | ~~~~~{language=text caption="rawdata5.txt"}
 54 |   ID006  A  1  1.0  3.0  4.5
 55 |   ID009  A  2  1.0  2.0  7.5
 56 |   ID012  A  1  2.0  4.0  3.5
 57 |   ID007  B  2  2.0  6.0 -0.5
 58 |   ID010  B  1  1.0  3.0  5.5
 59 |   ID013  B  2  2.0  6.0  1.5
 60 |   ID008  C  1  2.0  6.0 -1.5
 61 |   ID011  C  2  1.0  6.0  2.5
 62 |   ID014  C  1  1.0  8.0  0.5
 63 |   ID015  C  2  2.0  4.0  4.5
 64 | ~~~~~
 65 | 
 66 | The pedigree file is also the same as those used the previous one.
 67 | 
 68 | ~~~~~{language=text caption="rawpedigree5.txt"}
 69 | ID001        0         0
 70 | ID002        0         0
 71 | ID003        0         0
 72 | ID004        0         0
 73 | ID005        0         0
 74 | ID006        0         0
 75 | ID007    ID002     ID005
 76 | ID008    ID001     ID004
 77 | ID009    ID002     ID003
 78 | ID010    ID007     ID006
 79 | ID011    ID007     ID004
 80 | ID012    ID011     ID008
 81 | ID013    ID011     ID010
 82 | ID014    ID009     ID013
 83 | ID015    ID011     ID010
 84 | ~~~~~
 85 | 
 86 | The instruction file for this two-trait model is as follows.
 87 | 
 88 | ~~~~~{language=renumf90 caption="renum5.txt"}
 89 | DATAFILE
 90 | rawdata5.txt
 91 | TRAITS          # two-trait model: put 2 positions
 92 | 5 6
 93 | FIELDS_PASSED TO OUTPUT
 94 | 
 95 | WEIGHT(S)
 96 | 
 97 | RESIDUAL_VARIANCE
 98 | 2.0 1.0
 99 | 1.0 1.5
100 | EFFECT           # 1st effect fixed
101 | 2 2 cross alpha
102 | EFFECT           # 2nd effect fixed
103 | 3 3 cross alpha
104 | EFFECT           # 3rd effect fixed
105 | 4 4 cov
106 | EFFECT           # 4th effect
107 | 1 1 cross alpha
108 | RANDOM           ## treated as a random effect
109 | animal
110 | FILE             ## pedigree file
111 | rawpedigree5.txt
112 | FILE_POS         ## animal, sire and dam IDs, and two 0s
113 | 1 2 3 0 0
114 | (CO)VARIANCES    ## its variance component
115 |  0.50 -0.25
116 | -0.25  1.00
117 | ~~~~~
118 | 
119 | For a multiple-trait model, you can check the following points in an instruction file. Incorrect description in these statements falls in the sudden stop of RENUMF90.
120 | 
121 | - `TRAITS`: Enumerate the positions of observations in the data file. If you have 2 traits, you should put 2 numbers (positions) here.
122 | - `RESIDUAL_VARIANCE`: Put a residual covariance matrix here. The whole matrix is needed (all upper, lower, and diagonal elements).
123 | - `EFFECT`: Enumerate the position of effects for each trait in the data file; then put the effect type (and data type). For example, with a two-trait model, first, you should put 2 numbers (for positions of effects for trait 1 and 2), 1 keyword (`cross` or `cov`) and, possibly, data type (`numer` or `alpha`) if the data type is `cross`.
124 | - `(CO)VARIANCES`: Put a covariance matrix here. The whole matrix is needed.
125 | 
126 | Run RENUMF90 with the above files, and you can see the parameter file configured with the two-trait model.
127 | 
128 | 
129 | Missing observations
130 | --------------------
131 | 
132 | You can include missing observations in the data file. RENUMF90 does not distinguish the real and the missing observations; the program just read the raw data and just pass the observation to the renumbered file. Although RENUMF90 doesn't care about missing observations, BLUPF90 cares about the values. So the generated parameter file (`renf90.par`) should include an option to define the missing code. You can write the following option in the instruction file, and RENUMF90 passes it to `renf90.par`.
133 | 
134 | ~~~~~{language=blupf90}
135 | OPTION missing -999
136 | ~~~~~
137 | 
138 | This example defines an observation with the value `-999` as a missing code. The default is 0 (that is if you do not define any missing code, the program assumes 0 as missing).
139 | 
140 | Note that RENUMF90 still recognizes 0 as a missing observation in the computations of the basic statistics (for example average and standard deviation) just for your information. In this case, the statistics may be inaccurate but all the files are correctly renumbered.
141 | 
142 | Different models across traits (Unequal design matrices)
143 | --------------------------------------------------------
144 | 
145 | Now we consider a multiple-trait model which has different mathematical models across traits. It is typically called a model with unequal design matrices. Let us see the following two-trait model.
146 | $$
147 | \begin{aligned}
148 | y_{ijk:1} &= A_{i:1} + \phantom{S_{j:1} + } \beta_1 x_{ijk:1} + u_{k:1} + e_{ijk:1}\\
149 | y_{ijk:2} &= A_{i:2} + S_{j:2} + \beta_{2} x_{ijk:2} + u_{k:2} + e_{ijk:2}
150 | \end{aligned}
151 | $$
152 | In this model, the $S_{j:1}$ is missing for trait 1. To support it with RENUMF90, you can just change a line in the previous instruction file.
153 | 
154 | ~~~~~{language=renumf90}
155 | EFFECT                  # 2nd effect fixed
156 | 0 3 cross alpha
157 | ~~~~~
158 | 
159 | The position `0` means that this effect will not be included in the model for this trait.
160 | 
161 | We further consider the following two-trait model.
162 | $$
163 | \begin{aligned}
164 | y_{ijk:1} &= A_{i:1} \phantom{+ S_{j:1} } + \beta_1 x_{ijk:1} + u_{k:1} + e_{ijk:1}\\
165 | y_{ijk:2} &= \phantom{A_{i:2} + } S_{j:2} + \beta_{2} x_{ijk:2} + u_{k:2} + e_{ijk:2}
166 | \end{aligned}
167 | $$
168 | In this case, $S_{j:1}$ is missing for trait 1 and $A_{i:2}$ is missing for trait 2. Applying the above principle to this case, you can figure out the following solution (just showing a piece of instruction file).
169 | 
170 | ~~~~~{language=renumf90}
171 | EFFECT                  # 1st effect fixed
172 | 2 0 cross alpha
173 | EFFECT                  # 2nd effect fixed
174 | 0 3 cross alpha
175 | ~~~~~
176 | 
177 | Above statements are easy to understand which effects are missing for a particular trait.
178 | 
179 | 
180 | There is another way to handle such a model. You can combine above 2 `EFFECT` lines into 1 as follows.
181 | 
182 | ~~~~~{language=renumf90}
183 | EFFECT                  # 1st and 2nd effects fixed
184 | 2 3 cross alpha
185 | ~~~~~
186 | 
187 | This statement may be useful when the different effects are considered as similar types of effects. For example, in dairy cattle, a contemporary group effect (like HYS: herd-year-season) is commonly included in the model for production traits but the definition should be different across parities. HYS in the 1st lactation and HYS in the 2nd lactation are different but considered as similar effects. With this compact definition, memory requirements would reduce. If you are confused with this notation, you do not have to use the compact statement.
188 | 
189 | 
190 | Summary
191 | -------
192 | 
193 | - RENUMF90 supports multiple-trait models.
194 | - Carefully describe `TRAITS`, `RESIDUAL_VARIANCE`, `EFFECT` and `(CO)VARIANCES` in an instruction file.
195 | - `TRAIT` has the same number of entries to the number of traits.
196 | - Covariance matrix should be wholly stated.
197 | - `EFFECT` describes the position of effect in each trait.
198 | - Missing observations are passed to `renf90.dat`. You can use `OPTION missing` to the instruction file to change the code for missing observations.
199 | - RENUMF90 supports a multiple-trait model with unequal design matrices.
200 | 


--------------------------------------------------------------------------------