We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure coreplatform@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Genetics has been an important tool for discovering new aspects of biology across life. In humans, there is growing momentum behind the application of this knowledge to drive innovation in clinical care, most notably through developments in precision medicine. Nowhere has the impact of genetics on clinical practice been more striking than in the field of rare disorders. For most of these conditions, individual disease susceptibility is influenced by DNA sequence variation in a single or a small number of genes. In contrast, most common disorders are multifactorial and are caused by a complex interplay of multiple genetic, environmental and stochastic factors. The longstanding division of human disease genetics into rare and common components has obscured the continuum of human traits and echoes aspects of the century-old debate between the Mendelian and biometric views of human genetics. In this article, we discuss the differences in data and concepts between rare and common disease genetics. Opportunities to unify these two areas are noted and the importance of adopting a holistic perspective that integrates diverse genetic and environmental factors is discussed.
This chapter discusses several key directions such as data analytics in cyberphysical systems, multidomain mining, machine Learning concepts such as deep learning, generative adversarial networks, and challenges of model reuse. Last but not the least, the chapter closes with thoughts on ethical thinking in the data analytics process.
Chapter one sets the context of linked data for research. It describes the ways in which linked data is being used to improve diagnosis, treatment and healthcare delivery and to understand the drivers of health. The advantages of using linked data for research are discussed. The chapter surveys the kinds of data currently being linked for research and different linkage methods and considers the potential and challenges for future international data linkage.
Health research around the world relies on access to data, and much of the most valuable, reliable, and comprehensive data collections are held by governments. These collections, which contain data on whole populations, are a powerful tool in the hands of researchers, especially when they are linked and analyzed, and can help to address “wicked problems” in health and emerging global threats such as COVID-19. At the same time, these data collections contain sensitive information that must only be used in ways that respect the values, interests, and rights of individuals and their communities. Sharing Linked Data for Health Research provides a template for allowing research access to government data collections in a regulatory environment designed to build social license while supporting the research enterprise.
Industrial Data Analytics needs access to huge amounts of data, which is scattered across different IT systems. As part of an integrated reference kit for Industrial Data Analytics, there is a need for a data backend system that provides access to data. This system needs to have solutions for the extraction of data, the management of data and an analysis pipeline for those data. This paper presents an approach for this data backend system.
Chapter one sets the context of linked data for research. It describes the ways in which linked data is being used to improve diagnosis, treatment and healthcare delivery and to understand the drivers of health. The advantages of using linked data for research are discussed. The chapter surveys the kinds of data currently being linked for research and different linkage methods and considers the potential and challenges for future international data linkage.
The authors explain in this work a new approach to observing and controlling linear systems whose inputs and outputs are not fixed in advance. They cover a class of linear time-invariant state/signal system that is general enough to include most of the standard classes of linear time-invariant dynamical systems, but simple enough that it is easy to understand the fundamental principles. They begin by explaining the basic theory of finite-dimensional and bounded systems in a way suitable for graduate courses in systems theory and control. They then proceed to the more advanced infinite-dimensional setting, opening up new ways for researchers to study distributed parameter systems, including linear port-Hamiltonian systems and boundary triplets. They include the general non-passive part of the theory in continuous and discrete time, and provide a short introduction to the passive situation. Numerous examples from circuit theory are used to illustrate the theory.
Low-accruing clinical trials delay translation of research breakthroughs into the clinic, expose participants to risk without providing meaningful clinical insight, increase the cost of therapies, and waste limited resources. By tracking patient accrual, Clinical and Translational Science Awards hubs can identify at-risk studies and provide them the support needed to reach recruitment goals and maintain financial solvency. However, tracking accrual has proved challenging because relevant patient- and protocol-level data often reside in siloed systems. To address this fragmentation, in September 2020 the South Carolina Clinical and Translational Research Institute, with an academic home at the Medical University of South Carolina, implemented a clinical trial management system (CTMS), with its access to patient-level data, and incorporated it into its Research Integrated Network of Systems (RINS), which links study-level data across disparate systems relevant to clinical research. Within the first year of CTMS implementation, 324 protocols were funneled through CTMS/RINS, with more than 2600 participants enrolled. Integrated data from CTMS/RINS have enabled near-real-time assessment of patient accrual and accelerated reimbursement from industry sponsors. For institutions with bioinformatics or programming capacity, the CTMS/RINS integration provides a powerful model for tracking and improving clinical trial efficiency, compliance, and cost-effectiveness.
This article presents the background to and prospects for a new initiative in archaeological field survey and database integration. The Roman Hinterland Project combines data from the Tiber Valley Project, Roman Suburbium Project, and the Pontine Region Project into a single database, which the authors believe to be one of the most complete repositories of data for the hinterland of a major ancient metropolis, covering nearly 2000 years of history. The logic of combining these databases in the context of studying the Roman landscape is explained and illustrated with analyses that show their capacity to contribute to major debates in Roman economy, demography, and the longue durée of the human condition in a globalizing world.
Personalized medicine has exposed wearable sensors as new sources of biomedical data which are expected to accrue annual data storage costs of approximately $7.2 trillion by 2020 (>2000 exabytes). To improve the usability of wearable devices in healthcare, it is necessary to determine the minimum amount of data needed for accurate health assessment.
Methods:
Here, we present a generalizable optimization framework for determining the minimum necessary sampling rate for wearable sensors and apply our method to determine optimal optical blood volume pulse sampling rate. We implement t-tests, Bland–Altman analysis, and regression-based visualizations to identify optimal sampling rates of wrist-worn optical sensors.
Results:
We determine the optimal sampling rate of wrist-worn optical sensors for heart rate and heart rate variability monitoring to be 21–64 Hz, depending on the metric.
Conclusions:
Determining the optimal sampling rate allows us to compress biomedical data and reduce storage needs and financial costs. We have used optical heart rate sensors as a case study for the connection between data volumes and resource requirements to develop methodology for determining the optimal sampling rate for clinical relevance that minimizes resource utilization. This methodology is extensible to other wearable sensors.
The Human Brain Project (HBP), an EU Flagship Initiative, is currently building an infrastructure that will allow integration of large amounts of heterogeneous neuroscience data. The ultimate goal of the project is to develop a unified multi-level understanding of the brain and its diseases, and beyond this to emulate the computational capabilities of the brain. Reference atlases of the brain are one of the key components in this infrastructure. Based on a new generation of three-dimensional (3D) reference atlases, new solutions for analyzing and integrating brain data are being developed. HBP will build services for spatial query and analysis of brain data comparable to current online services for geospatial data. The services will provide interactive access to a wide range of data types that have information about anatomical location tied to them. The 3D volumetric nature of the brain, however, introduces a new level of complexity that requires a range of tools for making use of and interacting with the atlases. With such new tools, neuroscience research groups will be able to connect their data to atlas space, share their data through online data systems, and search and find other relevant data through the same systems. This new approach partly replaces earlier attempts to organize research data based only on a set of semantic terminologies describing the brain and its subdivisions.
Genebanks play an important role in the conservation of global plant biodiversity. The European Search Catalogue for Plant Genetic Resources (EURISCO) was created as a central entry point to provide information on these collections. However, a major challenge lies in the heterogeneity of scientific plant names. This makes the selection of suitable plant material, e.g. for research or breeding purposes, significantly more difficult. For this reason, the taxonomic backbone of EURISCO has been completely revised. Search terms entered by users are now automatically checked against taxonomic reference repositories, allowing a variety of synonyms to be identified. In addition, a fuzzy search has been implemented, which makes the search function tolerant of erroneous data (e.g. caused by typing errors). Besides improvements of the search interface, more support will be given to EURISCO's data providers. The new developments provide a tool that makes it easier to identify problem cases within the data, such as accepted/non-accepted taxonomic names, and will successively improve the quality of taxonomic information in EURISCO.
In this paper we use formal tools from category theory to develop a foundation for creating and managing models in systems where knowledge is distributed across multiple representations and formats. We define a class of models which incorporate three different representations---computations, logical semantics, and data--as well as model mappings (functors) to establish relationships between them. We prove that our models support model merge operations called colimits and use these to define a methodology for model integration.
The objective of this paper is to accurately determine mobile robots' position and orientation by integrating information received from odometry and an inertial sensor. The position and orientation provided by odometry are subject to different types of errors. To improve the odometry, an inertial measurement unit is exploited to give more reliable attitude information. However, the nonlinear dynamic of these systems and their complexities such as different sources of errors make navigation difficult. Since the dynamic models of navigation systems are nonlinear in practice, in this study, a Cubature Kalman Filter (CKF) has been proposed to estimate and correct the errors of these systems. The information from odometry and a gyroscope are integrated using a CKF. Simulation results are provided to illustrate the superiority and the higher reliability of the proposed approach in comparison with conventional nonlinear filtering algorithms such as an Extended Kalman Filter (EKF).
The need for coordinated regional and global electronic databases to assist prevention, early detection, rapid response, and control of biological invasions is well accepted. The Pacific Basin Information Node (PBIN), a node of the National Biological Information Infrastructure, has been increasingly engaged in the invasive species enterprise since its establishment in 2001. Since this time, PBIN has sought to support frontline efforts at combating invasions, through working with stakeholders in conservation, agriculture, forestry, health, and commerce to support joint information needs. Although initial emphasis has been on Hawaii, cooperative work with other Pacific islands and countries of the Pacific Rim is already underway and planned.
The field of disease ecology – the study of the spread and impact of parasites and pathogens within their host populations and communities – has a long history of using mathematical models. Dating back over 100 years, researchers have used mathematics to describe the spread of disease-causing agents, understand the relationship between host density and transmission and plan control strategies. The use of mathematical modelling in disease ecology exploded in the late 1970s and early 1980s through the work of Anderson and May (Anderson and May, 1978, 1981, 1992; May and Anderson, 1978), who developed the fundamental frameworks for studying microparasite (e.g. viruses, bacteria and protozoa) and macroparasite (e.g. helminth) dynamics, emphasizing the importance of understanding features such as the parasite's basic reproduction number (R0) and critical community size that form the basis of disease ecology research to this day. Since the initial models of disease population dynamics, which primarily focused on human diseases, theoretical disease research has expanded hugely to encompass livestock and wildlife disease systems, and also to explore evolutionary questions such as the evolution of parasite virulence or drug resistance. More recently there have been efforts to broaden the field still further, to move beyond the standard ‘one-host-one-parasite’ paradigm of the original models, to incorporate many aspects of complexity of natural systems, including multiple potential host species and interactions among multiple parasite species.
Epidemiological data are often fragmented, partial, and/or ambiguous and unable to yield the desired level of understanding of infectious disease dynamics to adequately inform control measures. Here, we show how the information contained in widely available serology data can be enhanced by integration with less common type-specific data, to improve the understanding of the transmission dynamics of complex multi-species pathogens and host communities. Using brucellosis in northern Tanzania as a case study, we developed a latent process model based on serology data obtained from the field, to reconstruct Brucella transmission dynamics. We were able to identify sheep and goats as a more likely source of human and animal infection than cattle; however, the highly cross-reactive nature of Brucella spp. meant that it was not possible to determine which Brucella species (B. abortus or B. melitensis) is responsible for human infection. We extended our model to integrate simulated serology and typing data, and show that although serology alone can identify the host source of human infection under certain restrictive conditions, the integration of even small amounts (5%) of typing data can improve understanding of complex epidemiological dynamics. We show that data integration will often be essential when more than one pathogen is present and when the distinction between exposed and infectious individuals is not clear from serology data. With increasing epidemiological complexity, serology data become less informative. However, we show how this weakness can be mitigated by integrating such data with typing data, thereby enhancing the inference from these data and improving understanding of the underlying dynamics.
Mixed methods is a youthful but increasingly robust methodological movement characterised by: a growing body of trans-disciplinary literature; prominent research methodologists/authorities; the emergence of mixed method specific journals, research texts, and courses; a growth in popularity amongst research funding bodies. Mixed methods is being utilised and reported within business and management fields, despite the quantitative traditions attached to certain business and management disciplines. This paper has utilised a multistrand conversion mixed model research design to undertake a retrospective content analysis of refereed papers (n = 281) from the 21st Australian and New Zealand Academy of Management (ANZAM) Conference 2007. The aim of the study is to provide a methodological map of the management research reported at the conference, and in particular the use, quality and acceptance level of mixed methods research within business and management fields. Implications for further research are discussed along with a call to the ‘first generation’ of business and management mixed method researchers to instigate mixed methods research training and capacity building within their respective business schools, relevant academies and associated professional forums and publications.
Genebanks are important suppliers of genetic resources to the genomics research community, and access to the resulting information will allow traditional genebank users to better select genetic material for their breeding and scientific programmes. We discuss herein a possible solution to interconnect these data automatically based on semantic web technology.
The information stored in animal feed databases is highly variable, in terms of both provenance and quality; therefore, data pre-processing is essential to ensure reliable results. Yet, pre-processing at best tends to be unsystematic; at worst, it may even be wholly ignored. This paper sought to develop a systematic approach to the various stages involved in pre-processing to improve feed database outputs. The database used contained analytical and nutritional data on roughly 20 000 alfalfa samples. A range of techniques were examined for integrating data from different sources, for detecting duplicates and, particularly, for detecting outliers. Special attention was paid to the comparison of univariate and multivariate solutions. Major issues relating to the heterogeneous nature of data contained in this database were explored, the observed outliers were characterized and ad hoc routines were designed for error control. Finally, a heuristic diagram was designed to systematize the various aspects involved in the detection and management of outliers and errors.