igHome is a customizable start page introduced in 2012 as an alternative to iGoogle, the personal web portal launched by Google in May 2005. Just like iGoogle, igHome offers users the possibility to build a start page containing a central search box and a number of gadgets. igHome mimics the user interface of iGoogle. Registered igHome users can create multiple tabs and import RSS feeds.
Natural Language Toolkit
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It supports classification, tokenization, stemming, tagging, parsing, and semantic reasoning functionalities. It was developed by Steven Bird and Edward Loper in the Department of Computer and Information Science at the University of Pennsylvania. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook. NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems. == Library highlights == Discourse representation Lexical analysis: Word and text tokenizer n-gram and collocations Part-of-speech tagger Tree model and Text chunker for capturing Named-entity recognition
Multiple sequence alignment
Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis and can highlight homologous features between sequences. Alignments highlight mutation events such as point mutations (single amino acid or nucleotide changes), insertion mutations and deletion mutations, and alignments are used to assess sequence conservation and infer the presence and activity of protein domains, tertiary structures, secondary structures, and individual amino acids or nucleotides. Multiple sequence alignments require more sophisticated methodologies than pairwise alignments, as they are more computationally complex. Most multiple sequence alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive. However, heuristic methods generally cannot guarantee high-quality solutions and have been shown to fail to yield near-optimal solutions on benchmark test cases. == Problem statement == Given m {\displaystyle m} sequences S i {\displaystyle S_{i}} , i = 1 , ⋯ , m {\displaystyle i=1,\cdots ,m} similar to the form below: S := { S 1 = ( S 11 , S 12 , … , S 1 n 1 ) S 2 = ( S 21 , S 22 , ⋯ , S 2 n 2 ) ⋮ S m = ( S m 1 , S m 2 , … , S m n m ) {\displaystyle S:={\begin{cases}S_{1}=(S_{11},S_{12},\ldots ,S_{1n_{1}})\\S_{2}=(S_{21},S_{22},\cdots ,S_{2n_{2}})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S_{m}=(S_{m1},S_{m2},\ldots ,S_{mn_{m}})\end{cases}}} A multiple sequence alignment is taken of this set of sequences S {\displaystyle S} by inserting any amount of gaps needed into each of the S i {\displaystyle S_{i}} sequences of S {\displaystyle S} until the modified sequences, S i ′ {\displaystyle S'_{i}} , all conform to length L ≥ max { n i ∣ i = 1 , … , m } {\displaystyle L\geq \max\{n_{i}\mid i=1,\ldots ,m\}} and no values in the sequences of S {\displaystyle S} of the same column consists of only gaps. The mathematical form of an MSA of the above sequence set is shown below: S ′ := { S 1 ′ = ( S 11 ′ , S 12 ′ , … , S 1 L ′ ) S 2 ′ = ( S 21 ′ , S 22 ′ , … , S 2 L ′ ) ⋮ S m ′ = ( S m 1 ′ , S m 2 ′ , … , S m L ′ ) {\displaystyle S':={\begin{cases}S'_{1}=(S'_{11},S'_{12},\ldots ,S'_{1L})\\S'_{2}=(S'_{21},S'_{22},\ldots ,S'_{2L})\\\,\,\,\,\,\,\,\,\,\,\vdots \\S'_{m}=(S'_{m1},S'_{m2},\ldots ,S'_{mL})\end{cases}}} To return from each particular sequence S i ′ {\displaystyle S'_{i}} to S i {\displaystyle S_{i}} , remove all gaps. == Graphing approach == A general approach when calculating multiple sequence alignments is to use graphs to identify all of the different alignments. When finding alignments via graph, a complete alignment is created in a weighted graph that contains a set of vertices and a set of edges. Each of the graph edges has a weight based on a certain heuristic that helps to score each alignment or subset of the original graph. === Tracing alignments === When determining the best suited alignments for each MSA, a trace is usually generated. A trace is a set of realized, or corresponding and aligned, vertices that has a specific weight based on the edges that are selected between corresponding vertices. When choosing traces for a set of sequences it is necessary to choose a trace with a maximum weight to get the best alignment of the sequences. == Alignment methods == There are various alignment methods used within multiple sequence to maximize scores and correctness of alignments. Each is usually based on a certain heuristic with an insight into the evolutionary process. Most try to replicate evolution to get the most realistic alignment possible to best predict relations between sequences. === Dynamic programming === A direct method for producing an MSA uses the dynamic programming technique to identify the globally optimal alignment solution. For proteins, this method usually involves two sets of parameters: a gap penalty and a substitution matrix assigning scores or probabilities to the alignment of each possible pair of amino acids based on the similarity of the amino acids' chemical properties and the evolutionary probability of the mutation. For nucleotide sequences, a similar gap penalty is used, but a much simpler substitution matrix, wherein only identical matches and mismatches are considered, is typical. The scores in the substitution matrix may be either all positive or a mix of positive and negative in the case of a global alignment, but must be both positive and negative, in the case of a local alignment. For n individual sequences, the naive method requires constructing the n-dimensional equivalent of the matrix formed in standard pairwise sequence alignment. The search space thus increases exponentially with increasing n and is also strongly dependent on sequence length. Expressed with the big O notation commonly used to measure computational complexity, a naïve MSA takes O(LengthNseqs) time to produce. To find the global optimum for n sequences this way has been shown to be an NP-complete problem. In 1989, based on Carrillo-Lipman Algorithm, Altschul introduced a practical method that uses pairwise alignments to constrain the n-dimensional search space. In this approach pairwise dynamic programming alignments are performed on each pair of sequences in the query set, and only the space near the n-dimensional intersection of these alignments is searched for the n-way alignment. The MSA program optimizes the sum of all of the pairs of characters at each position in the alignment (the so-called sum of pair score) and has been implemented in a software program for constructing multiple sequence alignments. In 2019, Hosseininasab and van Hoeve showed that by using decision diagrams, MSA may be modeled in polynomial space complexity. === Progressive alignment construction === The most widely used approach to multiple sequence alignments uses a heuristic search known as progressive technique (also known as the hierarchical or tree method) developed by Da-Fei Feng and Doolittle in 1987. Progressive alignment builds up a final MSA by combining pairwise alignments beginning with the most similar pair and progressing to the most distantly related. All progressive alignment methods require two stages: a first stage in which the relationships between the sequences are represented as a phylogenetic tree, called a guide tree, and a second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree. The initial guide tree is determined by an efficient clustering method such as neighbor-joining or unweighted pair group method with arithmetic mean (UPGMA), and may use distances based on the number of identical two-letter sub-sequences (as in FASTA rather than a dynamic programming alignment). Progressive alignments are not guaranteed to be globally optimal. The primary problem is that when errors are made at any stage in growing the MSA, these errors are then propagated through to the final result. Performance is also particularly bad when all of the sequences in the set are rather distantly related. Most modern progressive methods modify their scoring function with a secondary weighting function that assigns scaling factors to individual members of the query set in a nonlinear fashion based on their phylogenetic distance from their nearest neighbors. This corrects for non-random selection of the sequences given to the alignment program. Progressive alignment methods are efficient enough to implement on a large scale for many (100s to 1000s) sequences. A popular progressive alignment method has been the Clustal family. ClustalW is used extensively for phylogenetic tree construction, in spite of the author's explicit warnings that unedited alignments should not be used in such studies and as input for protein structure prediction by homology modeling. European Bioinformatics Institute (EMBL-EBI) announced that CLustalW2 will expire in August 2015. They recommend Clustal Omega which performs based on seeded guide trees and HMM profile-profile techniques for protein alignments. An alternative tool for progressive DNA alignments is multiple alignment using fast Fourier transform (MAFFT). Another common progressive alignment method named T-Coffee is slower than Clustal and its derivatives but generally produces more accurate alignments for distantly related sequence sets. T-Coffee calculates pairwise alignments by combining the direct alignment of the pair with indirect alignments that aligns each sequence of the pair to a third sequence. It uses the output from Clustal as well as another local alignment program LALIGN, which finds multiple regions of local alignment between two sequences. The resulting alignment and phylogenetic tree are used as a guide to produce new and more accurate w
Markov chain Monte Carlo
In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it, i.e. the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution. Markov chain Monte Carlo methods are used to study probability distributions that are too complex or too high dimensional to study with analytic techniques alone. Various algorithms exist for constructing such Markov chains, including the Metropolis–Hastings algorithm. == General explanation == Markov chain Monte Carlo methods create samples from a continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance. Practically, an ensemble of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities. Random walk Monte Carlo methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are autocorrelated. Correlations of samples introduces the need to use the Markov chain central limit theorem when estimating the error of mean values. These algorithms create Markov chains such that they have an equilibrium distribution which is proportional to the function given. == History == The development of MCMC methods is deeply rooted in the early exploration of Monte Carlo (MC) techniques in the mid-20th century, particularly in physics. These developments were marked by the Metropolis algorithm proposed by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller in 1953, which was designed to tackle high-dimensional integration problems using early computers. Then in 1970, W. K. Hastings generalized this algorithm and inadvertently introduced the component-wise updating idea, later known as Gibbs sampling. Simultaneously, the theoretical foundations for Gibbs sampling were being developed, such as the Hammersley–Clifford theorem from Julian Besag's 1974 paper. Although the seeds of MCMC were sown earlier, including the formal naming of Gibbs sampling in image processing by Stuart Geman and Donald Geman (1984) and the data augmentation method by Martin A. Tanner and Wing Hung Wong (1987), its "revolution" in mainstream statistics largely followed demonstrations of the universality and ease of implementation of sampling methods (especially Gibbs sampling) for complex statistical (particularly Bayesian) problems, spurred by increasing computational power and software like BUGS. This transformation was accompanied by significant theoretical advancements, such as Luke Tierney's (1994) rigorous treatment of MCMC convergence, and Jun S. Liu, Wong, and Augustine Kong's (1994, 1995) analysis of Gibbs sampler structure. Subsequent developments further expanded the MCMC toolkit, including particle filters (Sequential Monte Carlo) for sequential problems, Perfect sampling aiming for exact simulation (Jim Propp and David B. Wilson, 1996), RJMCMC (Peter J. Green, 1995) for handling variable-dimension models, and deeper investigations into convergence diagnostics and the central limit theorem. Overall, the evolution of MCMC represents a paradigm shift in statistical computation, enabling the analysis of numerous previously intractable complex models and continually expanding the scope and impact of statistics. == Mathematical setting == Suppose (Xn) is a Markov Chain in the general state space X {\displaystyle {\mathcal {X}}} with specific properties. We are interested in the limiting behavior of the partial sums: S n ( h ) = 1 n ∑ i = 1 n h ( X i ) {\displaystyle S_{n}(h)={\dfrac {1}{n}}\sum _{i=1}^{n}h(X_{i})} as n goes to infinity. Particularly, we hope to establish the Law of Large Numbers and the Central Limit Theorem for MCMC. In the following, we state some definitions and theorems necessary for the important convergence results. In short, we need the existence of invariant measure and Harris recurrent to establish the Law of Large Numbers of MCMC (Ergodic Theorem). And we need aperiodicity, irreducibility and extra conditions such as reversibility to ensure the Central Limit Theorem holds in MCMC. === Irreducibility and aperiodicity === Recall that in the discrete setting, a Markov chain is said to be irreducible if it is possible to reach any state from any other state in a finite number of steps with positive probability. However, in the continuous setting, point-to-point transitions have zero probability. In this case, φ-irreducibility generalizes irreducibility by using a reference measure φ on the measurable space ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} . Definition (φ-irreducibility) Given a measure φ {\displaystyle \varphi } defined on ( X , B ( X ) ) {\displaystyle ({\mathcal {X}},{\mathcal {B}}({\mathcal {X}}))} , the Markov chain ( X n ) {\displaystyle (X_{n})} with transition kernel K ( x , y ) {\displaystyle K(x,y)} is φ-irreducible if, for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} with φ ( A ) > 0 {\displaystyle \varphi (A)>0} , there exists n {\displaystyle n} such that K n ( x , A ) > 0 {\displaystyle K^{n}(x,A)>0} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} (Equivalently, P x ( τ A < ∞ ) > 0 {\displaystyle P_{x}(\tau _{A}<\infty )>0} , here τ A = inf { n ≥ 1 ; X n ∈ A } {\displaystyle \tau _{A}=\inf\{n\geq 1;X_{n}\in A\}} is the first n {\displaystyle n} for which the chain enters the set A {\displaystyle A} ). This is a more general definition for irreducibility of a Markov chain in non-discrete state space. In the discrete case, an irreducible Markov chain is said to be aperiodic if it has period 1. Formally, the period of a state ω ∈ X {\displaystyle \omega \in {\mathcal {X}}} is defined as: d ( ω ) := g c d { m ≥ 1 ; K m ( ω , ω ) > 0 } {\displaystyle d(\omega ):=\mathrm {gcd} \{m\geq 1\,;\,K^{m}(\omega ,\omega )>0\}} For the general (non-discrete) case, we define aperiodicity in terms of small sets: Definition (Cycle length and small sets) A φ-irreducible Markov chain ( X n ) {\displaystyle (X_{n})} has a cycle of length d if there exists a small set C {\displaystyle C} , an associated integer M {\displaystyle M} , and a probability distribution ν M {\displaystyle \nu _{M}} such that d is the greatest common divisor of: { m ≥ 1 ; ∃ δ m > 0 such that C is small for ν m ≥ δ m ν M } . {\displaystyle \{m\geq 1\,;\,\exists \,\delta _{m}>0{\text{ such that }}C{\text{ is small for }}\nu _{m}\geq \delta _{m}\nu _{M}\}.} A set C {\displaystyle C} is called small if there exists m ∈ N ∗ {\displaystyle m\in \mathbb {N} ^{}} and a nonzero measure ν m {\displaystyle \nu _{m}} such that: K m ( x , A ) ≥ ν m ( A ) , ∀ x ∈ C , ∀ A ∈ B ( X ) . {\displaystyle K^{m}(x,A)\geq \nu _{m}(A),\quad \forall x\in C,\,\forall A\in {\mathcal {B}}({\mathcal {X}}).} === Harris recurrent === Definition (Harris recurrence) A set A {\displaystyle A} is Harris recurrent if P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ A {\displaystyle x\in A} , where η A = ∑ n = 1 ∞ I A ( X n ) {\displaystyle \eta _{A}=\sum _{n=1}^{\infty }\mathbb {I} _{A}(X_{n})} is the number of visits of the chain ( X n ) {\displaystyle (X_{n})} to the set A {\displaystyle A} . The chain ( X n ) {\displaystyle (X_{n})} is said to be Harris recurrent if there exists a measure ψ {\displaystyle \psi } such that the chain is ψ {\displaystyle \psi } -irreducible and every measurable set A {\displaystyle A} with ψ ( A ) > 0 {\displaystyle \psi (A)>0} is Harris recurrent. A useful criterion for verifying Harris recurrence is the following: Proposition If for every A ∈ B ( X ) {\displaystyle A\in {\mathcal {B}}({\mathcal {X}})} , we have P x ( τ A < ∞ ) = 1 {\displaystyle P_{x}(\tau _{A}<\infty )=1} for every x ∈ A {\displaystyle x\in A} , then P x ( η A = ∞ ) = 1 {\displaystyle P_{x}(\eta _{A}=\infty )=1} for all x ∈ X {\displaystyle x\in {\mathcal {X}}} , and the chain ( X n ) {\displaystyle (X_{n})} is Harris recurrent. This definition is only needed when the state space X {\displaystyle {\mathcal {X}}} is uncountable. In the countable case, recurrence corresponds to E x [ η x ] = ∞ {\displaystyle \mathbb {E} _{x}[\eta _{x}]=\infty } , which is equivalent to P x ( τ x < ∞ ) = 1 {\displaystyle P_{x}(\tau _{x}<\infty )=1} for all x ∈ X {\displaystyle x\i
Katia Sycara
Ekaterini Panagiotou Sycara (Greek: Κάτια Συκαρά) is a Greek computer scientist. She is an Edward Fredkin Research Professor of Robotics in the Robotics Institute, School of Computer Science at Carnegie Mellon University internationally known for her research in artificial intelligence, particularly in the fields of negotiation, autonomous agents and multi-agent systems. She directs the Advanced Agent-Robotics Technology Lab at Robotics Institute, Carnegie Mellon University. She also serves as academic advisor for PhD students at both Robotics Institute and Tepper School of Business. == Education and early life == Born in Greece, she went to the United States to pursue advanced education through various scholarships, including a Fulbright (1965-1969). She received a B.S. in applied mathematics from Brown University, M.S. in electrical engineering from the University of Wisconsin–Milwaukee, and PhD in computer science from Georgia Institute of Technology. == Research and career == Sycara is a pioneer in the field of semantic web, case-based reasoning, autonomous agents and multi-agent systems. She has authored or co-authored more than 700 technical papers dealing with multi-agent systems, software agents, web services, semantic web, human–computer interaction, human-robot interaction, negotiation, case-based reasoning and the application of these techniques to crisis action planning, scheduling, manufacturing, healthcare management, financial planning and e-commerce.[1] She has led multimillion-dollar research effort funded by DARPA, NASA, AFOSR, ONR, AFRL, NSF and industry. Through an ONR MURI program and though the COABS DARPA program, Prof. Sycara's group has developed the RETSINA multiagent infrastructure, a toolkit that enables the development of heterogeneous software agents that can dynamically coordinate in open information environments (e.g. the Internet). RETSINA has been used in multiple applications including supporting human joint mission teams for crisis response; creating autonomous agents for situation awareness and information fusion; financial portfolio management, negotiations and coalition formation for e-commerce, and coordinating robots for Urban Search and Rescue. Sycara is one of the contributors to the development of OWL-S, the Darpa-sponsored language for Semantic Web services, as well as matchmaking and brokering software for agent discovery, service integration and semantic interoperation. === Academic service === Sycara is the founding Editor-in-Chief of the journal Autonomous Agents and Multi-Agent Systems; Editor-in-Chief, of the Springer Series on Agents; and Area Editor of AI and Management Science, the journal "Group Decision and Negotiation." She is a member of the Editorial Board, the Kluwer book series on "Multiagent Systems, Artificial Societies and Simulated Organizations"; member of the editorial board, the journals "Agent Oriented Software Engineering", "Web Intelligence and Agent Technologies", "Journal of Infonomics", "Fundamenda Informaticae", and "Concurrent Engineering: Research and Applications"; and member of the editorial board of the "ETAI journal on the Semantic Web" (1998–2001). She was on the Editorial Board of "IEEE Intelligent Systems and their Applications" (1992–1996), and "AI in Engineering" (1990–1996). She is a member of the Scientific Advisory Board of France Telecom, 2003-2009; member of the Scientific Advisory Board of the Institute of Informatics and Telecommunications of the Greek National Research Center Demokritos, 2004-2012; member of the AAAI Executive Council (1996–99); member of the OASIS Technical committee on the development of UDDI (Universal Description and Discovery for Interoperability) software which is an industry standard; and an invited expert for W3C (the World Wide Web Consortium) Working Group on Web Services Architecture. She was a founding member of the Board of Directors of the International Foundation of Multiagent Systems (IFMAS), and founding member of the Semantic Web Science Association. Sycara served as the program chair of the Second International Semantic Web Conference (ISWC 2003); general chair, of the Second International Conference on Autonomous Agents (Agents 98); chair of the Steering Committee of the Agents Conference (1999–2001); scholarship chair of AAAI (1993–1999); and the US co-chair for the US-Europe Semantic Web Services Initiative. === Awards and honors === Sycara is a Fellow of Institute of Electrical and Electronics Engineers (IEEE), and a Fellow of American Association for Artificial Intelligence (AAAI). Sycara is the recipient of the 2002 ACM/SIGART Agents Research Award. She is also the recipient of the 2015 Group Decision and Negotiation (GDN) Award of the Institute for Operations Research and the Management Sciences (INFORMS) GDN Section for her outstanding contributions to the field of group decision and negotiation. According to the citation of the award: Katia Sycara is widely acknowledged as one of the leading researchers in the field of autonomous software agents and in particular on problems related to joint decision making and negotiations of such agents. Her work is characterized by a unique combination of methods from Artificial Intelligence and research on human negotiations, and thus has contributed to significant advances in both fields. Sycara's robot teams have won multiple international awards. In the 2005 Robocup Urban Search and Rescue (US Open) held in Atlanta, her team won the First-in-Class Award for Autonomy, and the First-in-Class Award for Mobility. Two years later, again in Atlanta, she led another team that became a world champions in the 2007 International Robocup Search and Rescue Simulation League Competition. In 2008, her robotic team placed third in the Worldwide Robocup Championship Competition in the Urban Search and Rescue Virtual robots League held in Beijing, China. In 2005, she received the Outstanding Alumnus Award from the University of Wisconsin–Milwaukee. She was awarded an Honorary Doctorate from the University of the Aegean in 2004.
Spike-and-slab regression
Spike-and-slab regression is a type of Bayesian linear regression in which a particular hierarchical prior distribution for the regression coefficients is chosen such that only a subset of the possible regressors is retained. The technique is particularly useful when the number of possible predictors is larger than the number of observations. The idea of the spike-and-slab model was originally proposed by Mitchell & Beauchamp (1988). The approach was further significantly developed by Madigan & Raftery (1994) and George & McCulloch (1997). A recent and important contribution to this literature is Ishwaran & Rao (2005). == Model description == Suppose we have P possible predictors in some model. Vector γ has a length equal to P and consists of zeros and ones. This vector indicates whether a particular variable is included in the regression or not. If no specific prior information on initial inclusion probabilities of particular variables is available, a Bernoulli prior distribution is a common default choice. Conditional on a predictor being in the regression, we identify a prior distribution for the model coefficient, which corresponds to that variable (β). A common choice on that step is to use a normal prior with a mean equal to zero and a large variance calculated based on ( X T X ) − 1 {\displaystyle (X^{T}X)^{-1}} (where X {\displaystyle X} is a design matrix of explanatory variables of the model). A draw of γ from its prior distribution is a list of the variables included in the regression. Conditional on this set of selected variables, we take a draw from the prior distribution of the regression coefficients (if γi = 1 then βi ≠ 0 and if γi = 0 then βi = 0). βγ denotes the subset of β for which γi = 1. In the next step, we calculate a posterior probability for both inclusion and coefficients by applying a standard statistical procedure. All steps of the described algorithm are repeated thousands of times using the Markov chain Monte Carlo (MCMC) technique. As a result, we obtain a posterior distribution of γ (variable inclusion in the model), β (regression coefficient values) and the corresponding prediction of y. The model got its name (spike-and-slab) due to the shape of the two prior distributions. The "spike" is the probability of a particular coefficient in the model to be zero. The "slab" is the prior distribution for the regression coefficient values. An advantage of Bayesian variable selection techniques is that they are able to make use of prior knowledge about the model. In the absence of such knowledge, some reasonable default values can be used; to quote Scott and Varian (2013): "For the analyst who prefers simplicity at the cost of some reasonable assumptions, useful prior information can be reduced to an expected model size, an expected R2, and a sample size ν determining the weight given to the guess at R2." Some researchers suggest the following default values: R2 = 0.5, ν = 0.01, and π = 0.5 (parameter of a prior Bernoulli distribution).
Vasant Honavar
Vasant G. Honavar is an Indian-American computer scientist, and artificial intelligence, machine learning, big data, data science, causal inference, knowledge representation, bioinformatics and health informatics researcher and professor. == Early life and education == Vasant Honavar was born at Pune, India to Bhavani G. and Gajanan N. Honavar. He received his early education at the Vidya Vardhaka Sangha High School and M.E.S. College in Bangalore, India. He received a B.E. in Electronics & Communications Engineering from the B.M.S. College of Engineering in Bangalore, India in 1982, when it was affiliated with Bangalore University, an M.S. in electrical and computer engineering in 1984 from Drexel University, and an M.S. in computer science in 1989, and a Ph.D. in 1990, respectively, from the University of Wisconsin–Madison, where he studied Artificial Intelligence and worked with Leonard Uhr. == Career == Honavar is on the faculty of Informatics and Intelligent Systems Department in the Penn State College of Information Sciences and Technology at Pennsylvania State University where he currently holds the Dorothy Foehr Huck and J. Lloyd Huck Chair in Biomedical Data Sciences and Artificial Intelligence and previously held the Edward Frymoyer Endowed Chair in Information Sciences and Technology. He serves on the faculties of the graduate programs in Computer Science, Informatics, Bioinformatics and Genomics, Neuroscience, Operations Research, Public Health Sciences, and of undergraduate programs in Data Science and Artificial Intelligence methods and applications. Honavar serves as the director of the Artificial Intelligence Research Laboratory, Director of Strategic Initiatives for the Institute for Computational and Data Sciences and the director of the Center for Artificial Intelligence Foundations and Scientific Applications at Pennsylvania State University. Honavar served on the Leadership Team of the Northeast Big Data Innovation Hub. Honavar served on the Computing Research Association's Computing Community Consortium Council during 2014-2017, where he chaired the task force on Convergence of Data and Computing, and was a member of the task force on Artificial Intelligence. Honavar was the first Sudha Murty Distinguished Visiting Chair of Neurocomputing and Data Science by the Indian Institute of Science, Bangalore, India. Honavar was named a Distinguished Member of the Association for Computing Machinery for "outstanding scientific contributions to computing"; and elected a Fellow of the American Association for the Advancement of Science for his "distinguished research contributions and leadership in data science". As a Program Director in the Information Integration and Informatics program in the Information and Intelligent Systems Division of the Computer and Information Science and Engineering Directorate of the US National Science Foundation during 2010-13, Honavar led the Big Data Program. Honavar was a professor of computer science at Iowa State University where he led the Artificial Intelligence Research Laboratory which he founded in 1990 and was instrumental in establishing an interdepartmental graduate program in Bioinformatics and Computational Biology (and served as its Chair during 2003–2005). Honavar has held visiting professorships at Carnegie Mellon University, the University of Wisconsin–Madison, and at the Indian Institute of Science. == Research == Honavar's research has contributed to advances in artificial intelligence, machine learning, causal inference, knowledge representation, neural networks, semantic web, big data analytics, and bioinformatics and computational biology. He was a program chair of the Association for the Advancement of Artificial Intelligence(AAAI)'s 36th Conference on Artificial Intelligence. He has published over 300 research articles, including many highly cited ones, as well as several books on these topics. His recent work has focused on federated machine learning algorithms for constructing predictive models from distributed data and linked open data, learning predictive models from high dimensional longitudinal data, reasoning with federated knowledge bases, detecting algorithmic bias, big data analytics, analysis and prediction of protein-protein, protein-RNA, and protein-DNA interfaces and interactions, social network analytics, health informatics, secrecy-preserving query answering, representing and reasoning about preferences, and causal inference from complex, e.g., relational, data, large language models, diffusion models, and meta analysis. Honavar has been active in fostering national and international scientific collaborations in Artificial Intelligence, Data Sciences, and their applications in addressing national, international, and societal priorities in accelerating science, improving health, transforming agriculture through partnerships that bring together academia, non-profits, and industry. He is also active in making the science policy case for major national research initiatives such as AI for accelerating science and AI for combating the epidemic of diseases of despair. == Honors == National Science Foundation Director's Award for Superior Accomplishment, 2013 National Science Foundation Director's Award for Collaborative Integration, 2012 Margaret Ellen White Graduate Faculty Award, Iowa State University, 2011 Outstanding Career Achievement in Research Award, College of Liberal Arts and Sciences, Iowa State University, 2008 Regents Award for Faculty Excellence, Iowa Board of Regents, 2007 Edward Frymoyer Endowed Chair in Information Sciences and Technology, Penn State College of Information Sciences and Technology, Pennsylvania State University, 2013 Senior Faculty Research Excellence Award, Penn State College of Information Sciences and Technology, Pennsylvania State University, 2016 125 People of Impact, Department of Electrical and Computer Engineering, University of Wisconsin-Madison, 2016 Sudha Murty Distinguished (Visiting) Chair of Neurocomputing and Data Science, Indian Institute of Science, 2016-2021 ACM Distinguished Member, 2018 AAAS Fellow American Association for the Advancement of Science, 2018 EAI Fellow European Alliance for Innovation, 2019 Dorothy Foehr Huck and J. Lloyd Huck Chair in Biomedical Data Sciences and Artificial Intelligence, Pennsylvania State University, 2021