While I've been reading a lot of black bear papers recently in preparation for a grant proposal, today I'm going to post about the issue of presence-only data: data that include only information about where individuals have been detected and do not include information about where individuals have not been detected. Data that include both are often referred to as presence-absence data, which in most cases, if not all, are preferred because information about where individuals are not is just as informative for inferences about range distributions, population size, etc. However, sometimes such data can't be collected, because data are from museum or herbarium collections, are collected opportunistically or incidentally from sightings or reports. Opportunistic citizen science data is an example of presence-only data. You take what you can get. In those cases, perhaps a lot of data can be collected but the 'compromised' quality requires some thought and creative work-arounds. Ive only just started thinking and reading about the problem, but my understanding of it is as such:
The problem: The data collected consist of instances and locations where individuals or species havebeen observed or detected, i.e., y=1. Basing range distributions solely on this data ignores the places that were not sampled that truly have y=1's, and there is also no information about where there are not individuals, in which y=0. There is no reference or background against which to assess the collected y=1's. Thus, only naive, cursory estimates of occupancy are possible - or at least so has been the prevailing thought.
Current proposed and adopted methods: In addition to the observed y=1's, environmental or covariate data are often also collected at the sampled locations, in an attempt to relate presence, occupancy, or distribution to environmental attributes like elevation, percent forest cover or urban densities.
There are envelope models that describe the distribution of the presence-only data. Methods like BIOCLIM, HABITAT, and SVM are examples, and I know nothing about these.
An option is to determine a reference or background against which to compare the observed y=1's. In other words, with the lack of y=0 data, an option is to generate them, or create pseudo-absences. But post-hoc y=0's could contain both true y=0's as well as some y=1's that appear to be y=0 if detection probability is less than certain or 100%. Furthermore, a researcher then has to determine how many of these psuedo-absences to create, and how many are created can greatly affect the probability of occurence, that state y=1. One approach to creating psuedo-absences is a case-control design, where the logistic regression for the probability of state y=1 given the environmental covariate data is adjusted with the ln of the proportions of the occupied and unoccupied locations. But we dont know how those occupied and unoccupied locations are actually split n the real world. That background is an unsampled matrix of unused landscapes (like plants that either use a spot or dont), or it can be viewed as available for use (like a bear moving on a landscape could use agricultural areas, but perhaps just less often); this is apparently a subtle but methodologically important distinction. Some have suggested that the ratio of sampled to unsampled background locations be several orders of magnitude in size to minimize sampling errors (Manly et al 2002 and McDonald 2003 via Pearce and Boyce 2005). An exponential model to estimate the relative likelihood of occupancy or occurrence can be used instead of the logistic model, and finally another approach is to use a logistic regression to approximate a logistic discrimination model. When relative likelihoods of occupancy are estimated, they are not constrained to be less than 1, which is weird. All of these approaches attempt to account for the background or landscape of the data from which the observed y=1's were collected, but each take a slightly different approach that I have not read enough on to describe.
Pearce and Boyce (2005) state, " We are unaware of any application explicitly modelling abundance given presence only". Although not about abundance, Royle et al in 2012 came out with a likelihood approach for occurrence/occupancy probability with presence-only data, arguing that the popular and widely-used Maxent doesnt actually do that but instead provides habitat suitability indices that are quite different from estimates of occurrence probability. Royle et al provide a parametric approach, that can be implemented via MaxLik, an R package, and invoke Bayes rule that requires random sampling and constant detection probability. The major problems with MaxEnt, they argue, as I see it, is that they use a penalized and exponential version of the detection probability given occurrence, based on the maxmum entropy distrbution. This penalization shrinks the regression coefficients to 0, but Royle et al argues that this approach biases the estimator because the intercept, Beta0, is set to be an arbitrarily determined number. In comparing MaxEnt to their approach, they found that MaxEnt provided variable under and overestimates. They caution that effort and detection probability are in fact often not consistent, such as with roadside surveys or where density of the study population or effort are high.
Of course, a reply came quickly in 2013, from Hastie and Fithin (2013) , and the debate about making inferences from presence-only data is not over yet. They claim Royle et al have performed "statistical alchemy" by imposing parametric assumptions to estimate overall occurrence probability. This is shaky ground to build inference.
Needless to say, presence-only data are tricky to work with, and presence-absence data seem preferable in every comparison. On top of these issues, another aspect of presence-only data, that it can be cheaper to collect and therefore yield larger datasets, makes it an attractive to use. In particular, I have been thinking about studies that may try to combine presence-absence data, such as from capture-recapture and occupancy efforts, with presence-only. It seems like finding a way to generate psuedo-absences to make the presence-only data mirror the presence-absence data would be one approach. I can imagine studies where presence-absence data are capture-recapture collected, while presence-only data come from depauperate occupancy approaches. How to combine, then? Blanc et al (2014) provide one such example with Eurasian lynx, by making abundance an explicit instead of derived parameter the estimating models, and hinging the connection between abundance and occupancy on the fact that occupancy is only possible when abundance is >0. They mention that their approach is a development on Freeman and Besbeas (2012) with the addition of imperfect detection. But then they mention that this is all for non-spatial capture recapture, because their abundance N~homogeneousPoisson(lambda) and N is explicit, whereas spatial capture recapture approaches use an inhomogeneous process and N is derived.
PEARCE, J. L. and BOYCE, M. S. (2006), Modelling distribution and abundance with presence-only data. Journal of Applied Ecology, 43: 405–412. doi: 10.1111/j.1365-2664.2005.01112.x
Royle, J. A., Chandler, R. B., Yackulic, C. and Nichols, J. D. (2012), Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions. Methods in Ecology and Evolution, 3: 545–554. doi: 10.1111/j.2041-210X.2011.00182.x
Hastie, T., & Fithian, W. (2013). Inference from presence-only data; the ongoing controversy. Ecography, 36(8), 864–867. doi:10.1111/j.1600-0587.2013.00321.x
Blanc, L., Marboutin, E., Gatti, S., Zimmermann, F., Gimenez, O. (2014), Improving abundance estimation by combining capture–recapture and occupancy data: example with a large carnivore. Journal of Applied Ecology, 51: 1733–1739. doi: 10.1111/1365-2664.12319
I am a PhD student at Cornell University, in the Department of Natural Resources. I am also a student in the New York Cooperative Fish and Wildlife Research Unit.
Friday, January 30, 2015
Thursday, January 22, 2015
Influence of drift and admixture on population structure of American black bears in the Central Interior Highlands 50 years after translocation
Puckett, E. E. et al. 2014. Molecular Ecology 23: 2414-2427
Objectives
During the 19th and early 20th centuries, back bears experienced both overall range contraction and local extirpation (Smith and Clark 1994?). In the 60s and 70s, black bears were translocated from Minnesota and Manitoba to Arkansas, and since then the population size has increased. It has since 8 been generations since translocation, given a generation time of 6.3 years (Onorato et al 2004). There are some physical barriers, including rivers, highways, and discontinuous forest habitat.
"Bottlenecks, founder events (Nei et al 1975), and genetic drift (Nei and Tajima 1981) often result in decreased genetic diversity and increased population differentiation". On the other hand, migration can decrease the effects of drift due to gene flow.
The objective of the study was to identify population structure of bears in the 50 years following translocation and test for signatures of remnant genetic lineages.
Methods:
15 microsatellites for 7 studies, totally n=643 bears
mtdna with cytochrome b.
Results
STRUCTURE results at successive numbers of K reflect the varying processes effecting differentiation between population.
Haplotypic diversity mirrored nuclear genetic diversity. Admixture was the best supported model. Lower frequency hapolotypes at the tips of the network suggest new mutations derived from haplotypes occupying internal nodes, but distinct haplotypes could be remnant original ones, recent mutations, or the result of introduction from translocation.
Drift supported by decreased Fst values from the sources in MN and Manitoba to the study area
Mamagement implications: conserve gene flow, afford protection to subpopulations
Objectives
During the 19th and early 20th centuries, back bears experienced both overall range contraction and local extirpation (Smith and Clark 1994?). In the 60s and 70s, black bears were translocated from Minnesota and Manitoba to Arkansas, and since then the population size has increased. It has since 8 been generations since translocation, given a generation time of 6.3 years (Onorato et al 2004). There are some physical barriers, including rivers, highways, and discontinuous forest habitat.
"Bottlenecks, founder events (Nei et al 1975), and genetic drift (Nei and Tajima 1981) often result in decreased genetic diversity and increased population differentiation". On the other hand, migration can decrease the effects of drift due to gene flow.
The objective of the study was to identify population structure of bears in the 50 years following translocation and test for signatures of remnant genetic lineages.
Methods:
15 microsatellites for 7 studies, totally n=643 bears
- identified parent offspring pairs with ml-relate and removed on individual from meac
- deviations from HWE with ARLEQUIN
- null alleles with Micro-checker.
- differences in allelic richness with Kruskal-Wallis test
- population structure with STRUCTURE, and analysed hierarchical substructure we separate analyses for each of K=4 under admixture
- migration using BAYESASS.
- demographic history with DIYABC, and tested different hypotheses: admixture, founder, split
mtdna with cytochrome b.
- aligned data wtih GENEIOUS
- assigned new haplotypes to Wooding and Ward's clades with MRBAYES.
- identified substitution rate with FINDMODEL
- haplotype and nucleotide frequencies in ARLEQUIN.
Results
STRUCTURE results at successive numbers of K reflect the varying processes effecting differentiation between population.
Haplotypic diversity mirrored nuclear genetic diversity. Admixture was the best supported model. Lower frequency hapolotypes at the tips of the network suggest new mutations derived from haplotypes occupying internal nodes, but distinct haplotypes could be remnant original ones, recent mutations, or the result of introduction from translocation.
Drift supported by decreased Fst values from the sources in MN and Manitoba to the study area
Mamagement implications: conserve gene flow, afford protection to subpopulations
Tuesday, January 20, 2015
Phylogeography and Pleistocene Evolution in the North American Black Bear
Wooding, S., and R. Ward. 1997. Mol. Biol. Evol 14:1096-1105.
Objectives: "to determine the character of phylogeographic structuring in a widespread North American carnivore" by 1) identifying if distinct patterns of distribution are present, 2) ascertaining the time scale over which diversity has evolved, using a molecular clock, 3a) identifying patterns of recent population growth using pairwise comparison of sequences, 3b) determining prevalance of migration by comparing geographic distributions of diversity and the context of lineage age, and 4) discussing patterns of genetic diversity with respect to geological and habitat changes
Methods:
"The long-term population history of black bears appears to be characterized predominantly by long-term regional isolation followed by recent contact and hybridization": two major clades were identified from 12 lineages and were spatially clustered, with one clade represented in 14/16 localities. The clades differed at an average of 4.8% of nucleotide positions, which is unusual within mammalian populations, and suggests deep/long-term divergence. The origin of black bear clades seem to have originated on the Pliocene/Pleistocene boundary, 1.6-2.0 MYA, with patterns in diversity congruent " with forest refuge formation during the Pleistocene and regional expansion based on expanding forest refugia.
Within regions, no obvious phylogeographic structuring is present, and that dispersal between populaton is probably a regular occurrence (may need to look at microsatellites to find any significant structure, which could also identify sex-biased differences since usats are biparentally inherited). Black bears have expanded with changes in their forest habitat, and patterns of genetic diversity within regions may be strongly affected by both regional mixing and population growth.
Keywords and Concepts:
molecular phylogeography: " a means of understanding evolutionary processes within species" (Avise 1994), and for "understanding the historical factors leading to extant patterns of diversity", using information from "geographical distrbution and topological relationships of genetic lineages, which reflects the long term structure and demographic history of populations"
mtdna: mitochondrial DNA and is circular and double-stranded. It is only inherited maternally, and therefore has a smaller effective population size, and genetic drift can have a stronger effect. Has a higher mutation rate than nuclear DNA, and therefore can be used to track long ancestries
RFLPs: restriction fragment length polymorphism. fragmenting sequences of homologous DNA using restriction enzymes, and then separating the fragments by length.
Simulations suggest that lineage age ∝ lineage range ( Neigel and Avise 1993)
Objectives: "to determine the character of phylogeographic structuring in a widespread North American carnivore" by 1) identifying if distinct patterns of distribution are present, 2) ascertaining the time scale over which diversity has evolved, using a molecular clock, 3a) identifying patterns of recent population growth using pairwise comparison of sequences, 3b) determining prevalance of migration by comparing geographic distributions of diversity and the context of lineage age, and 4) discussing patterns of genetic diversity with respect to geological and habitat changes
Methods:
- n=118 mtDNA sequences; Human primers H16498 and L15997, to amplify a control region in the mtDNA; Sequencing of single strand products
- n=258 RFLPS of bears from 16 localities; Clades identified in sequencing were used to identify diagnostic RFLPS so that future unsequenced samples could be assigned to clades by amplifying the human primers and digesting the PCR products with restriction enzymes
- Calculated a nucelotide substitution rate for the control region, using methods detailed by Waits 1996, resulting in 2.8% per Myr (slow for mammal coding)
- Used asiatic black bear as outgroup for phylogeographic analyses
- Population growth assessed with mismatch distributions for pairwise sequences, to see if sample evolved in a growing population (Rogers 1995).
"The long-term population history of black bears appears to be characterized predominantly by long-term regional isolation followed by recent contact and hybridization": two major clades were identified from 12 lineages and were spatially clustered, with one clade represented in 14/16 localities. The clades differed at an average of 4.8% of nucleotide positions, which is unusual within mammalian populations, and suggests deep/long-term divergence. The origin of black bear clades seem to have originated on the Pliocene/Pleistocene boundary, 1.6-2.0 MYA, with patterns in diversity congruent " with forest refuge formation during the Pleistocene and regional expansion based on expanding forest refugia.
Within regions, no obvious phylogeographic structuring is present, and that dispersal between populaton is probably a regular occurrence (may need to look at microsatellites to find any significant structure, which could also identify sex-biased differences since usats are biparentally inherited). Black bears have expanded with changes in their forest habitat, and patterns of genetic diversity within regions may be strongly affected by both regional mixing and population growth.
Keywords and Concepts:
molecular phylogeography: " a means of understanding evolutionary processes within species" (Avise 1994), and for "understanding the historical factors leading to extant patterns of diversity", using information from "geographical distrbution and topological relationships of genetic lineages, which reflects the long term structure and demographic history of populations"
mtdna: mitochondrial DNA and is circular and double-stranded. It is only inherited maternally, and therefore has a smaller effective population size, and genetic drift can have a stronger effect. Has a higher mutation rate than nuclear DNA, and therefore can be used to track long ancestries
RFLPs: restriction fragment length polymorphism. fragmenting sequences of homologous DNA using restriction enzymes, and then separating the fragments by length.
Simulations suggest that lineage age ∝ lineage range ( Neigel and Avise 1993)
The beginning of a long organization project, hopefully
Good Afternoon!
This blog is a personal attempt to organize the papers I've read for discussion meetings with my research labs: one in loosely focused on conservation genetics and another in spatial ecology. I recognize that this could be done using a citation manager, but I've got a backlog of papers starting from a few years back. I'd to make sure that I lose as little information and understanding gained from reading these papers. Furthermore, this will be good for me to "write" on a consistent basis
Each post will be a summary of a paper that I've read and discussed in lab meetings. They will include major conclusions, definitions of key words, and explicit but generalizable concepts. Organization and linking between posts will be maintained through blog labels, so that I can quickly find all the papers that talk about particular concepts, like Fst, for example.
This blog is a personal attempt to organize the papers I've read for discussion meetings with my research labs: one in loosely focused on conservation genetics and another in spatial ecology. I recognize that this could be done using a citation manager, but I've got a backlog of papers starting from a few years back. I'd to make sure that I lose as little information and understanding gained from reading these papers. Furthermore, this will be good for me to "write" on a consistent basis
Each post will be a summary of a paper that I've read and discussed in lab meetings. They will include major conclusions, definitions of key words, and explicit but generalizable concepts. Organization and linking between posts will be maintained through blog labels, so that I can quickly find all the papers that talk about particular concepts, like Fst, for example.
Subscribe to:
Posts (Atom)