Download Free New Developments In Multiple Testing And Multivariate Testing For High Dimensional Data Book in PDF and EPUB Free Download. You can read online New Developments In Multiple Testing And Multivariate Testing For High Dimensional Data and write the review.

This thesis aims to develop some new and novel methods in advancing multivariate testing and multiple testing for high-dimensional small sample size data. In Chapter 2, we propose a likelihood ratio test framework for testing normal mean vectors in high-dimensional data under two common scenarios: the one-sample test and the two-sample test with equal covariance matrices. We derive the test statistics under the assumption that the covariance matrices follow a diagonal matrix structure. In comparison with the diagonal Hotelling's tests, our proposed test statistics display some interesting characteristics. In particular, they are a summation of the log-transformed squared t-statistics rather than a direct summation of those components. More importantly, to derive the asymptotic normality of our test statistics under the null and local alternative hypotheses, we do not need the requirement that the covariance matrices follow a diagonal matrix structure. As a consequence, our proposed test methods are very flexible and readily applicable in practice. Monte Carlo simulations and a real data analysis are also carried out to demonstrate the advantages of the proposed methods. In Chapter 3, we propose a pairwise Hotelling's method for testing high-dimensional mean vectors. The new test statistics make a compromise on whether using all the correlations or completely abandoning them. To achieve the goal, we perform a screening procedure, pick up the paired covariates with strong correlations, and construct a classical Hotelling's statistic for each pair. While for the individual covariates without strong correlations with others, we apply squared t statistics to account for their respective contributions to the multivariate testing problem. As a consequence, our proposed test statistics involve a combination of the collected pairwise Hotelling's test statistics and squared t statistics. The asymptotic normality of our test statistics under the null and local alternative hypotheses are also derived under some regularity conditions. Numerical studies and two real data examples demonstrate the efficacy of our pairwise Hotelling's test. In Chapter 4, we propose a regularized t distribution and also explore its applications in multiple testing. The motivation of this topic dates back to microarray studies, where the expression levels of thousands of genes are measured simultaneously by the microarray technology. To identify genes that are differentially expressed between two or more groups, one needs to conduct hypothesis test for each gene. However, as microarray experiments are often with a small number of replicates, Student's t-tests using the sample means and standard deviations may suffer a low power for detecting differentially expressed genes. To overcome this problem, we first propose a regularized t distribution and derive its statistical properties including the probability density function and the moments. The noncentral regularized t distribution is also introduced for the power analysis. To demonstrate the usefulness of the proposed test, we apply the regularized t distribution to the gene expression detection problem. Simulation studies and two real data examples show that the regularized t-test outperforms the existing tests including Student's t-test and the Bayesian t-test in a wide range of settings, in particular when the sample size is small.
This volume contains a collection of research articles on multivariate statistical methods, encompassing both theoretical advances and emerging applications in a variety of scientific disciplines. It serves as a tribute to Professor S N Roy, an eminent statistician who has made seminal contributions to the area of multivariate statistical methods, on his birth centenary. In the area of emerging applications, the topics include bioinformatics, categorical data and clinical trials, econometrics, longitudinal data analysis, microarray data analysis, sample surveys, statistical process control, etc. Researchers, professionals and advanced graduates will find the book an essential resource for modern developments in theory as well as for innovative and emerging important applications in the area of multivariate statistical methods.
In the context of large-scale multiple testing, hypotheses are often accompanied with certain prior information. In chapter 2, we present a single-index modulated multiple testing procedure, which maintains control of the false discovery rate while incorporating prior information, by assuming the availability of a bivariate p-value for each hypothesis. To find the optimal rejection region for the bivariate p-value, we propose a criteria based on the ratio of probability density functions of the bivariate p-value under the true null and non-null. This criteria in the bivariate normal setting further motivates us to project the bivariate p-value to a single index p-value, for a wide range of directions. The true null distribution of the single index p-value is estimated via parametric and nonparametric approaches, leading to two procedures for estimating and controlling the false discovery rate. To derive the optimal projection direction, we propose a new approach based on power comparison, which is further shown to be consistent under some mild conditions. Multiple testing based on chi-squared test statistics is commonly used in many scientific fields such as genomics research and brain imaging studies. However, the challenges associated with designing a formal testing procedure when there exists a general dependence structure across the chi-squared test statistics have not been well addressed. In chapter 3, we propose a Factor Connected procedure to fill in this gap. We first adopt a latent factor structure to construct a testing framework for approximating the false discovery proportion (FDP) for a large number of highly correlated chi-squared test statistics with finite degrees of freedom k. The testing framework is then connected to simultaneously testing k linear constraints in a large dimensional linear factor model involved with some observable and unobservable common factors, resulting in a consistent estimator of FDP based on the associated unadjusted p-values.
Complex multivariate testing problems are frequently encountered in many scientific disciplines, such as engineering, medicine and the social sciences. As a result, modern statistics needs permutation testing for complex data with low sample size and many variables, especially in observational studies. The Authors give a general overview on permutation tests with a focus on recent theoretical advances within univariate and multivariate complex permutation testing problems, this book brings the reader completely up to date with today’s current thinking. Key Features: Examines the most up-to-date methodologies of univariate and multivariate permutation testing. Includes extensive software codes in MATLAB, R and SAS, featuring worked examples, and uses real case studies from both experimental and observational studies. Includes a standalone free software NPC Test Release 10 with a graphical interface which allows practitioners from every scientific field to easily implement almost all complex testing procedures included in the book. Presents and discusses solutions to the most important and frequently encountered real problems in multivariate analyses. A supplementary website containing all of the data sets examined in the book along with ready to use software codes. Together with a wide set of application cases, the Authors present a thorough theory of permutation testing both with formal description and proofs, and analysing real case studies. Practitioners and researchers, working in different scientific fields such as engineering, biostatistics, psychology or medicine will benefit from this book.
​​ ​ In statistics, the Behrens–Fisher problem is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples. In his 1935 paper, Fisher outlined an approach to the Behrens-Fisher problem. Since high-speed computers were not available in Fisher’s time, this approach was not implementable and was soon forgotten. Fortunately, now that high-speed computers are available, this approach can easily be implemented using just a desktop or a laptop computer. Furthermore, Fisher’s approach was proposed for univariate samples. But this approach can also be generalized to the multivariate case. In this monograph, we present the solution to the afore-mentioned multivariate generalization of the Behrens-Fisher problem. We start out by presenting a test of multivariate normality, proceed to test(s) of equality of covariance matrices, and end with our solution to the multivariate Behrens-Fisher problem. All methods proposed in this monograph will be include both the randomly-incomplete-data case as well as the complete-data case. Moreover, all methods considered in this monograph will be tested using both simulations and examples. ​
Experimental Design: Procedures for Behavioral Sciences, Fourth Edition is a classic text with a reputuation for accessibility and readability. It has been revised and updated to make learning design concepts even easier. Roger E. Kirk shows how three simple experimental designs can be combined to form a variety of complex designs. He provides diagrams illustrating how subjects are assigned to treatments and treatment combinations. New terms are emphasized in boldface type, there are summaries of the advantages and disadvantages of each design, and real-life examples show how the designs are used.
The collection and analysis of data play an important role in many fields of science and technology, such as computational biology, quantitative finance, information engineering, machine learning, neuroscience, medicine, and the social sciences. Especially in the era of big data, researchers can easily collect data characterised by massive dimensions and complexity. In celebration of Professor Kai-Tai Fang’s 80th birthday, we present this book, which furthers new and exciting developments in modern statistical theories, methods and applications. The book features four review papers on Professor Fang’s numerous contributions to the fields of experimental design, multivariate analysis, data mining and education. It also contains twenty research articles contributed by prominent and active figures in their fields. The articles cover a wide range of important topics such as experimental design, multivariate analysis, data mining, hypothesis testing and statistical models.
Driven by a wide range of contemporary applications, statistical inference for covariance structures has been an active area of current research in high-dimensional statistics. This review provides a selective survey of some recent developments in hypothesis testing for high-dimensional covariance structures, including global testing for the overall pattern of the covariance structures and simultaneous testing of a large collection of hypotheses on the local covariance structures with false discovery proportion and false discovery rate control. Both one-sample and two-sample settings are considered. The specific testing problems discussed include global testing for the covariance, correlation, and precision matrices, and multiple testing for the correlations, Gaussian graphical models, and differential networks.
The book brings together experts working in public health and multi-disciplinary areas to present recent issues in statistical methodological development and their applications. This timely book will impact model development and data analyses of public health research across a wide spectrum of analysis. Data and software used in the studies are available for the reader to replicate the models and outcomes. The fifteen chapters range in focus from techniques for dealing with missing data with Bayesian estimation, health surveillance and population definition and implications in applied latent class analysis, to multiple comparison and meta-analysis in public health data. Researchers in biomedical and public health research will find this book to be a useful reference and it can be used in graduate level classes.
This book presents the proceedings of the 2nd Pacific Rim Statistical Conference for Production Engineering: Production Engineering, Big Data and Statistics, which took place at Seoul National University in Seoul, Korea in December, 2016. The papers included discuss a wide range of statistical challenges, methods and applications for big data in production engineering, and introduce recent advances in relevant statistical methods.