Increasing Breeding without Breeding ( BwB ) Efficiency : Full-vs . Partial-Pedigree Reconstruction in Lodgepole Pine

Forest tree breeding is a long-term endeavor often adopting the recurrent selection scheme [1] where hundreds of parents are rigorously tested through the performance of several thousands of their offspring planted over vast geographic territories known as breeding zones [2]. Parental ranking, for forward selection, is often based on offspring’s performance which is followed by the selection of elite genotypes for either new rounds of breeding (matings, testing, and selection) or the establishment of production populations (a.k.a., seed orchards) [3]. Breeding and testing are the most costly and time consuming aspects of tree breeding. Breeding is done following one of the established mating designs to generated “structured” pedigree (halfand full-sib families) needed for genetic parameters (e.g., traits’ heritabilities and correlations, and parents and offspring’s breeding values) estimation [4]. The creation of structured pedigree is meticulous work requiring great care and often takes multiple years to complete owing to the large number of parents and the required numerous crosses. Completion of the breeding phase is often delayed by fertility and phenological differences among the breeding parents [5]. The authenticity of the resulting offspring affects the accuracy of the generated genetic parameters and ultimately the attained genetic gain; unfortunately, this process is never error-free [6,7].


Introduction
Forest tree breeding is a long-term endeavor often adopting the recurrent selection scheme [1] where hundreds of parents are rigorously tested through the performance of several thousands of their offspring planted over vast geographic territories known as breeding zones [2].Parental ranking, for forward selection, is often based on offspring's performance which is followed by the selection of elite genotypes for either new rounds of breeding (matings, testing, and selection) or the establishment of production populations (a.k.a., seed orchards) [3].Breeding and testing are the most costly and time consuming aspects of tree breeding.Breeding is done following one of the established mating designs to generated "structured" pedigree (half-and full-sib families) needed for genetic parameters (e.g., traits' heritabilities and correlations, and parents and offspring's breeding values) estimation [4].The creation of structured pedigree is meticulous work requiring great care and often takes multiple years to complete owing to the large number of parents and the required numerous crosses.Completion of the breeding phase is often delayed by fertility and phenological differences among the breeding parents [5].The authenticity of the resulting offspring affects the accuracy of the generated genetic parameters and ultimately the attained genetic gain; unfortunately, this process is never error-free [6,7].
Forest tree breeders attempted to simplify breeding through the use of "wind-/open-pollinated" families [8,9] and often treated them as half-sib families as maternal parents are known and assumed that offspring is sired by large number of male donors; however, the possibilities of having full-sibs or selfs within these "half-sib" families is high.Thus, treating wind-/openpollinated families as half-sibs leads to an over inflated additive genetic variance estimation and subsequently breeding values and heritabilities, resulting to an inaccurate ranking of parents (seed donors) [10][11][12].The availability of reliable, informative molecular markers coupled with paternity assignment methods [13] created an opportunity whereby the breeding phase of tree breeding could be effectively eliminated.Lambeth, et al. [14] were the first to capitalize on this development and used paternity assignment to unravel the paternal parents in a polymix breeding framework.This approach was further extended and the "Breeding without Breeding" concept was developed [15][16][17][18] and offered a viable option for breeding-phase avoidance in tree breeding programs.
Here we test two sampling methods for structured pedigree assembly; namely, partial-and full-pedigree reconstruction using equal sample sizes drawn from a 74-parent lodgepole pine parental population.Partial-and full-pedigree reconstruction were represented by family array (individuals generated from a subset of parental seed-donors) and random sampling (individuals drawn from a seedling population representing the reproductive output of the entire parental population), respectively.Pedigree reconstruction was based on using genomic and chloroplast DNA microsatellite markers.

Abstract
The advantage of paternity assignment in assembling structured pedigree for breeding is investigated using two sampling methods; namely, family array (known maternal parent) and random offspring (unknown maternal and paternal parents) collected from an openpollinated lodgepole pine experimental population with known parents (N = 74) using nuclear and chloroplast microsatellite markers.Offspring of equivalent sample sizes representing the family array (n = 619) and random offspring (n = 635) were genotyped and subjected to partial and full pedigree reconstruction, respectively.The full pedigree reconstruction assembled substantially larger number of full-sib families than the partial (446 vs. 268) and interestingly the two methods detected equivalent amount of external gene flow to the experimental population.The superiority of the random offspring over the family array sampling in producing more full-sib families was attributed to its better representation of the parental population, as random sampling included offspring from most parents as compared to the parent-limited family array.Owing to the observed advantages, the full pedigree reconstruction could be employed as an alternative to the breeding phase commonly required in conventional breeding programs for the development of structured pedigree needed for genetic parameters estimation.

Seed orchard population and offspring sampling
A 71-clone lodgepole pine seed orchard located near Armstrong, British Columbia (50˚ 23' N, 119˚ 17' E, 470 m a.s.l.) provided the material for this study.The orchard was established in 1994 following the permutated neighborhood design which maximizes the separation distances among ramets of the same clone, hence minimizing selfing [19].At the time of sampling (2007), the orchard's population consisted of 1,047 ramets representing the 71 parents (13.9 ± 7.0 SD ramets per parent).Dormant vegetative buds were sampled from the entire orchard's parental population (2 random ramets/parent) and two seed sampling methods; namely, 1) family array (known 11 seed-donors, each with 56.3 ± 7.3 SD seed/parent (N = 619)) and 2) bulk sample (random sample of 635 seeds from the entire orchard's seed crop with unknown maternal and paternal parentage).The dormant buds were stored at -80°C until DNA extraction while the seeds were stored at 4°C until germination.

Parentage analyses
For paternity assignment, we used a likelihood-based paternity inference method with a known level of statistical confidence and accounting for genotyping errors [25] (CERVUS 3.0.3).Two parentage analyses were carried out, one for the family array with known maternal parent and the other was a parent pair analysis with unknown sexes of the candidate parents for the bulk seed sample.The paternal population (N = 74) (the orchard's known 71 parents plus 3 additional alien genotypes detected during the orchard's parental genotyping).The parentage analysis for the known mother-offspring genotypes was based on 10,000 simulations with 74 sampled candidate parents, genotyping error rate of 0.01, and 95% (strict) confidence level using the 9 nuclear SSRs.We chose 6 cpSSRs to permit the identification of the paternal parentage from the most likely parent pair [24].We conducted the identity analysis with cpSSRs (also in CERVUS 3.0.3),after creating dummy genotypes via converting the haploid profiles to a hypothetically complete homozygous offspring.For each offspring, the paternal parent determined by the identity analysis was compared with the two parents identified by the parent pair analysis.The maternity analysis with known fathers (although, strictly speaking, with fathers deduced from marker evidence) was then conducted for these offspring, using the same parameters described earlier.

Results
The paternity assignment analyses were successful in assigning the male parent for 528 out of 619 offspring (85.3%) and both male and female parents for 522 out of 635 offspring (82.2%) for the family array and random sample, respectively.
The inability to assign paternity or maternity to the remaining offspring was either due to insufficient informative genotypes to match the candidate parents with 95% confidence, or that seeds are sired by parents from outside the studied population (i.e., the product of gene flow/pollen contamination), or a combination of both.Since the 9 nuclear SSRs used are highly polymorphic and possess low null allele frequencies [23] and the fact that most of the unassigned offspring had mismatches on at least two loci, then it is conceivable to assume that the used loci provide the required statistical power.
The additional 6 uniparentally inherited cpSSRs (mean: 4.8 and SD: 1.3 alleles/locus, range: 4-7) produced unique 51 multi loci.These unique haplotypes were essential in providing the high discrimination power needed for the successful assignment of the male parents in the random sample and resulted in increasing the number of successfully assigned males to 545 offspring being successfully assigned to one of the candidate fathers (85.8%) (additional 23 offspring).The identity analysis fully corresponded with the parent pair analysis, as for each of the analyzed offspring the assigned candidate paternal parent was the same as one of the two most likely parents determined by the parent pair analysis (in total 545 offspring).The unassigned offspring on the male side are most likely a product of gene flow from non-sampled candidate paternal parents from outside the studied population, producing gene flow estimates of 14.7 and 14.2% for the family array and bulk seed sample, respectively.The close to identical estimates of gene flow sheds light on the accuracy of the pedigree reconstruction of assigning the male or female and male parents for family array and random sample, respectively.The utility of these unassigned individuals to quantitative genetic analyses is documented in the Discussion section (below).It should be noted based on these results that had we only used the nuclear markers and the standard parent pair analysis, we would have been able to identify which two parents produced a given offspring.
Pedigree reconstruction of the family array produced 268 full-sib families nested within the 11 sampled maternal half-sib families, ranging in number from 17 (maternal half-sib family #37) to 31 (#52) and in size from 1 to 15 individuals per fullsib family (Figure 1).Pedigree reconstruction of the random sample captured offspring of 65 out of the 74 candidate mothers present in the seed orchard (87.8%) and, consequently, revealing a considerably higher number of full-sib families than the family array analysis (446 full-sib families, ranging in size between 1 and 4 (Figure 2)).These results were anticipated as the random sample, unlike maternal family array, represented the entire population's reproductive output.
The paternal half-sib family sizes ranged from 1 (nine families) to 58 (family #52) and from 1 (three families) to 28 (family #61) for the family array and random sample, respectively, with a positive correlation (r = 0.61, N = 74, p < 0.05) (Figure 3).This represents Pearson's product-moment correlation between vectors of paternal HS family sizes (male reproductive success) estimated by the two approaches (i.e., family array and bulk sample) for all 74 paternal parents existing in the seed orchard.
The variation in the paternal half-sib family sizes between these two approaches might have been due to the sampling methods of the individuals assayed, because seed representing each maternal half-sib family (i.e., family array) was only collected from one single ramet (i.e., one position) while the random sample was taken from a mixture of seed collected from the entire seed-producing population.Figure 2 illustrates the ability of the random sample to forming substantial number of full-sib families representing 87.8% of the population parents as well as demonstrates the restrictive ability of the family array sampling which is limited by the number of seed-donors sampled.

Discussion
Forest tree breeders utilize mating designs to create the "structured" pedigree needed for estimating the genetic parameters needed for elite genotypes identification and their selection for either breeding or seed production (seed orchards) [2].Tree breeding programs often harbor large number of parents, thus, irrespective of which mating design is used; a substantial number of controlled crosses are needed.The physical task of controlled crosses itself is often hampered by parental fecundity and reproductive phenology variation, thus in most cases multiple years are needed for this phase completion and even when completed, cases of mistaken parental authenticity are common [6,7].The partial or complete avoidance of using controlled crosses for structured pedigree formation would be a favorable development to tree breeding programs.The combined use of DNA fingerprinting and pedigree reconstruction provided an opportunity for bypassing the breeding phase and "structured" pedigree can be assembled for quantitative genetics analyses.It should be stated that the resulting structured pedigree from pedigree reconstruction is often unbalanced favoring the more fecund parents and is greatly affected by the degree of gene flow from outside undesirable sources (i.e., wasted genotyping efforts) (Figures 1 and 2).However, the utilization of quantitative genetics' algorithms such as ASReml [26] with their versatility to handle very large, multi-generational, and statically and genetically imbalance data sets made these analyses feasible and the restrictions of having balanced pedigree or statistical designs became unnecessary.This was clearly demonstrated by El-Kassaby, et al. [16] who presented an analysis for unbalanced structured pedigree that included a mixture of full-and halfsib families with various sample sizes.The inclusion of halfsib families in the analysis provide a situation where offspring from known mothers but unknown fathers (i.e., those sired by gene flow) could be effectively used to increase the precision of the estimated genetic parameters, thus the notion of "wasted" fingerprinting efforts is rectified.
The advantage of pedigree reconstruction, partial or full, is apparent from Figures 1 and 2. If the disconnected diallel mating design was used to create crosses for the 74 parents used in this study, then at least 12, 6-parent diallel unites is needed and a total   of 180 crosses would have been created.The family array and random sampling produced 268 and 446 crosses, respectively, exceeding that from the disconnected diallel mating design without making a single cross.The resulting crosses offered more mating combinations than those from the disconnected diallel mating design, thus eliminating the sampling caveat of this design where crosses are restricted to within diallel unites and not among.It should be stated that the use of the nuclear SSR markers, alone, were sufficient in constructing the resulting crosses in the partial pedigree reconstruction as the offspring was collected from known maternal parents and thus the inference of parentage was restricted to the paternal component.On the Partial-Pedigree Reconstruction in Lodgepole Pine.SOJ Genet Sci 2(1):1-6.

Increasing Breeding without Breeding (BwB) Efficiency: Full-vs. Partial-Pedigree Reconstruction in Lodgepole Pine
Copyright: © 2015 El-Kassaby et al. other hand, the identity of the paternal parentage in the bulk seed sample required supplement of an additional set of uniparentally inherited markers, thus cpDNA markers were used to separate males with similar nuclear genotypes [27][28][29].
Pedigree reconstruction has been extensively used to assess male and female fertility variation as well as selfing and gene flow rates in seed orchard populations [22,[30][31][32][33][34].The use of pedigree reconstruction as a platform for breeding was first proposed by El-Kassaby, et al. [15] and its theoretical foundation was illustrated by El-Kassaby and Lstibůrek [34] using a Douglas-fir retrospective study and was further demonstrated as an avenue for testing and selection of elite genotypes using a combination of assembled full-sib and wind-pollinated half-sib families from a western larch experimental population [16].However, it should be stated that the work of Lambeth, et al. [14] was inspring as it demonstrated the power of pedigree reconstruction in determining the male parents of crosses produced through polycross mating design (pollen consisted of a mixture from several male parents, thus parernity is unknown and the resulting families were considered half-sibs) and thus converted a set of half-sib to a full-sib families.Pedigree reconstruction as an aid to breeding has gained momentum and several retrospective studies on Eucalyptus urophylla [35], Pinus pinster [36], Abies nordmanniana [37] and Picea rubens [38] have been documented.
In conclusion, based on the present study results, we recommend the use of full pedigree reconstruction using individuals with unknown paternal and maternal parentage to enable the posterior assemblage of naturally occurring crosses among population's members, resulting in the creation of a mating design in the extent that would otherwise only be accomplishable by controlled pollination with extremely high costs and labor efforts.

Figure 1 :
Figure 1: Distribution of 528 naturally occurred matings in a lodgepole pine seed orchard (74 parents) revealed by partial pedigree reconstruction of 11 wind-pollinated maternal half-sib families using nine nuclear microsatellite loci.

Figure 2 :Figure 3a :
Figure 2: Distribution of 522 naturally occurred matings in a lodgepole pine seed orchard (74 parents) revealed by full pedigree reconstruction of random sample of offspring with unknown maternal and paternal parentage using a combination of nine nuclear and six chloroplast microsatellite loci.

Figure 3b :
Figure 3b: Maternal half-sib family sizes obtained from full pedigree reconstruction of random sample offspring from a lodgepole pine seed orchard (black bars represent the 11 family arrays studied).