We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We suggest that foundation models are general purpose solutions similar to general purpose programmable microprocessors, where fine-tuning and prompt-engineering are analogous to coding for microprocessors. Evaluating general purpose solutions is not like hypothesis testing. We want to know how well the machine will perform on an unknown program with unknown inputs for unknown users with unknown budgets and unknown utility functions. This paper is based on an invited talk by John Mashey, “Lessons from SPEC,” at an ACL-2021 workshop on benchmarking. Mashey started by describing Standard Performance Evaluation Corporation (SPEC), a benchmark that has had more impact than benchmarks in our field because SPEC addresses an import commercial question: which CPU should I buy? In addition, SPEC can be interpreted to show that CPUs are 50,000 faster than they were 40 years ago. It is remarkable that we can make such statements without specifying the program, users, task, dataset, etc. It would be desirable to make quantitative statements about improvements of general purpose foundation models over years/decades without specifying tasks, datasets, use cases, etc.
This paper investigates the intricate interplay between tax expenditures (TEs) and social policy. Leveraging the Global Tax Expenditures Database (GTED), we carry out the first data-driven comparative assessment of direct spending and TEs for social welfare across countries to shed light on this often-overlooked aspect of fiscal policy. Our research reveals prevalent TE usage for social purposes and substantial costs in terms of revenue forgone worldwide, averaging over 1 per cent of Gross Domestic Product (GDP) and 6 per cent of tax revenue. Our analysis showcases varying strategies employed by countries, particularly emphasizing the reliance of high-income economies on TEs granted through personal income taxes, and low/middle-income countries predominantly using value-added tax-related TEs for social objectives. Our results also highlight the importance of functions such as housing in contributing significantly to social spending through TEs with the ratio tax expenditure/direct spending reaching roughly 365 per cent in the US and 203 per cent in France. Hence, our study underlines the necessity for meticulous evaluation and efficient design of TEs to better align TE regimes with governments social policy objectives as well as to minimise unintended social or economic consequences.
Surgical advancements in paediatric cardiovascular surgery have led to improved survival rates for those patients with the most complex CHDs leading to greater numbers of patients who are living well into adulthood. Despite this new era of long-term survival, our current reporting systems continue to focus largely on using short-term postoperative outcomes as the criteria to both rate and rank hospitals. Using such limited criteria to rate and rank hospitals may mislead the intended audiences: patients and families. The goal of this article is to describe the creation of a local benchmarking report which aims to retrospectively review long-term outcomes from our single centre. This report is updated annually and published on our cardiac surgery webpage in an effort to be as transparent as possible for our patient and family communities.
Local governments have an important role to play in creating healthy, equitable and environmentally sustainable food systems. This study aimed to develop and pilot a tool and process for local governments in Australia to benchmark their policies for creating healthy, equitable and environmentally sustainable food systems.
Design:
The Healthy Food Environment Policy Index (Food-EPI), developed in 2013 for national governments, was tailored to develop the Local Food Systems Policy Index (Local Food-EPI+) tool for local governments. To incorporate environmental sustainability and the local government context, this process involved a literature review and collaboration with an international and domestic expert advisory committee (n 35) and local government officials.
Setting:
Local governments.
Results:
The tool consists of sixty-one indicators across ten food policy domains (weighted based on relative importance): leadership; governance; funding and resources; monitoring and intelligence; food production and supply chain; food promotion; food provision and retail in public facilities and spaces; supermarkets and food sources in the community; food waste reuse, redistribution and reduction; and support for communities. Pilot implementation of the tool in one local government demonstrated that the assessment process was feasible and likely to be helpful in guiding policy implementation.
Conclusion:
The Local Food-EPI+ tool and assessment process offer a comprehensive mechanism to assist local governments in benchmarking their actions to improve the healthiness, equity and environmental sustainability of food systems and prioritise action areas. Broad use of this tool will identify and promote leading practices, increase accountability for action and build capacity and collaborations.
Sodium intake attributed to fast food is increasing globally. This research aims to develop maximum sodium reduction targets for New Zealand (NZ) fast foods and compare them with the current sodium content of products. Sodium content and serving size data were sourced from an existing database of major NZ fast-food chains. Target development followed a step-by-step process, informed by international targets and serving sizes, and previous methods for packaged supermarket foods. Sodium reduction targets were set per 100 g and serving, using a 40% reduction in the mean sodium content or the value met by 35–45% of products. Thirty-four per cent (1797/5246) of products in the database had sodium data available for target development. Sodium reduction targets were developed for 17 fast-food categories. Per 100 g targets ranged from 158 mg for ‘Other salads’ to 665 mg for ‘Mayonnaise and dressings’. Per serving targets ranged from 118 mg for ‘Sauce’ to 1270 mg for ‘Burgers with cured meat’. The largest difference between the current mean sodium content and corresponding target was for ‘Other salads’ and ‘Grilled Chicken’ (both –40% per 100g) and ‘Fries and potato products’ (–45% per serving), and the smallest, ‘Pizza with cured meat toppings’ (–3% per 100 g) and ‘Pies, tarts, sausage rolls and quiches’ (–4% per serving). The results indicate the display of nutrition information should be mandated and there is considerable room for sodium reduction in NZ fast foods. The methods described provide a model for other countries to develop country-specific, fast-food sodium reduction targets.
The point has repeatedly been made that validation is a crucial success factor in demonstrating the scientific contribution and ensuring the adoption of results. Still, researchers in design science validate their research findings too infrequently. We must all evaluate our claimed contributions on open benchmarks to improve validation quality and foster cumulative research. In this paper, we propose a meta-model to standardise and operationalise the concept of open scientific benchmarks in design science and to guide communities of researchers in the co-development of scientific benchmarks.
When do cross-national comparisons enable citizens to hold governments accountable? According to recent work in comparative politics, benchmarking across borders is a powerful mechanism for making elections work. However, little attention has been paid to the choice of benchmarks and how they shape democratic accountability. We extend existing theories to account for endogenous benchmarking. Using the COVID-19 pandemic as a test case, we embedded experiments capturing self-selection and exogenous exposure to benchmark information from representative surveys in France, Germany, and the UK. The experiments reveal that when individuals have the choice, they are likely to seek out congruent information in line with their prior view of the government. Moreover, going beyond existing experiments on motivated reasoning and biased information choice, endogenous benchmarking occurs in all three countries despite the absence of partisan labels. Altogether, our results suggest that endogenous benchmarking weakens the democratic benefits of comparisons across borders.
Edited by
David Lynch, Federal Reserve Board of Governors,Iftekhar Hasan, Fordham University Graduate Schools of Business,Akhtar Siddique, Office of the Comptroller of the Currency
Stress-testing models pose a unique set of challenges with respect to performance monitoring. In particular, unlike standard forecasting models that generate unconditional forecasts, stress-testing models generate conditional forecasts based on stress scenarios that are unlikely to occur. This critical difference greatly limits one’s ability to assess model projections with observed outcomes. We provide several different methods for this purpose
We study an optimal investment problem under a joint limited expected relative loss and portfolio insurance constraint with a general random benchmark. By making use of a static Lagrangian method in a complete market setting, the optimal wealth and investment strategy can be fully determined along with the existence and uniqueness of the Lagrangian multipliers. Our numerical demonstration for various commonly used random benchmarks shows a trade-off between the portfolio outperformance and underperformance relative to the benchmark, which may not be captured by the widely used Omega ratio and its utility-transformed version, reflecting the impact of the benchmarking loss constraint. Furthermore, we develop a new portfolio performance measurement indicator that incorporates the agent’s utility loss aversion relative to the benchmark via solving an equivalent optimal asset allocation problem with a benchmark-reference-based preference. We show that the expected utility performance is well depicted by looking at this new portfolio performance ratio, suggesting a more suitable portfolio performance measurement under a limited loss constraint relative to a possibly random benchmark.
Indicators are important sources of information about problems across many policy areas. However, despite a growing number of indicators across most policy areas, such as health care, business promotion, or environmental protection, we still know little about if, how, and when such indicators affect the policy agenda. This article develops a theoretical answer to these questions and examines the implications using a new large-n dataset with 220,000 parliamentary questions asked by government and opposition MPs in Australia, Belgium, Denmark, Germany, France, Italy, and Spain. The data contain information on political attention to 17 problems, such as unemployment, C02 emission, and crime from 1960 to 2015. Across this wealth of data, the article demonstrates that politicians respond to the severity and development of problem indicators over time and in comparison to other countries. Results also show that politicians respond much more when problem indicators develop negatively than when they develop positively.
In this chapter, we broadly distinguish research as a process for deriving new knowledge, and evidence as the knowledge that is produced and used within a specific context. Evidence from other sources is also used to inform health and treatment decisions in paediatric settings.
We introduce audits and benchmarking as important tools for measuring healthcare quality and safety, and discuss their relevance to the generation of clinical research questions. Within the construct of research co-design and evidence-based decision-making, this chapter also discusses special considerations for conducting ethical research with children and young people. Acknowledging that there may be age or developmental challenges, we explore ways in which children and young people can be supported to become more involved in setting their own research priorities and designing and ‘doing’ research.
Despite broad agreement on the need for comprehensive policy action to improve the healthiness of food environments, implementation of recommended policies has been slow and fragmented. Benchmarking is increasingly being used to strengthen accountability for action. However, there have been few evaluations of benchmarking and accountability initiatives to understand their contribution to policy change. This study aimed to evaluate the impact of the Healthy Food Environment Policy Index (Food-EPI) Australia initiative (2016–2020) that assessed Australian governments on their progress in implementing recommended policies for improving food environments.
Design:
A convergent mixed methods approach was employed incorporating data from online surveys (conducted in 2017 and 2020) and in-depth semi-structured interviews (conducted in 2020). Data were analysed against a pre-defined logic model.
Setting:
Australia.
Participants:
Interviews: twenty stakeholders (sixteen government, four non-government). Online surveys: fifty-three non-government stakeholders (52 % response rate) in 2017; thirty-four non-government stakeholders (36 % response rate) in 2020.
Results:
The Food-EPI process involved extensive engagement with government officials and the broader public health community across Australia. Food-EPI Australia was found to support policy processes, including as a tool to increase knowledge of good practice, as a process for collaboration and as an authoritative reference to support policy decisions and advocacy strategies.
Conclusions:
Key stakeholders involved in the Food-EPI Australia process viewed it as a valuable initiative that should be repeated to maximise its value as an accountability mechanism. The highly collaborative nature of the initiative was seen as a key strength that could inform design of other benchmarking processes.
Process models are among the principal artefacts used for managing design projects. However, the selection of effective modelling approaches can be difficult for design project managers, given that a plethora of tools exists for various modelling purposes. In addition to date no systematic approach for the assessment and selection of process modelling approaches is available to practitioners. This paper presents the development of criteria for benchmarking and selecting different process modelling tools. The results are based on three elements. (1) In a four-hour workshop undertaken by the Design Process SIG of the Design Society, bringing together around 20 international researchers and practitioners in design process modelling, an initial set of 58 criteria were brainstormed and consolidated during the workshop and in follow-up meetings. (2) The consolidated criteria were then compared with literature. The finalised criteria list was then validated by external experts in industry (3). The resulting list of 12 criteria provides a sound basis for practitioners to support a systematic selection of process modelling approaches. Further, it lays the foundation of a benchmarking tool, which is subject to future work.
Developing agents capable of commonsense reasoning is an important goal in Artificial Intelligence (AI) research. Because commonsense is broadly defined, a computational theory that can formally categorize the various kinds of commonsense knowledge is critical for enabling fundamental research in this area. In a recent book, Gordon and Hobbs described such a categorization, argued to be reasonably complete. However, the theory’s reliability has not been independently evaluated through human annotator judgments. This paper describes such an experimental study, whereby annotations were elicited across a subset of eight foundational categories proposed in the original Gordon-Hobbs theory. We avoid bias by eliciting annotations on 200 sentences from a commonsense benchmark dataset independently developed by an external organization. The results show that, while humans agree on relatively concrete categories like time and space, they disagree on more abstract concepts. The implications of these findings are briefly discussed.
Despite the continued investment in Indigenous support networks and dedicated education units within universities, levels of key performance indicators for Indigenous students—access, participation, success and completion (attainment)—remain below that of the overall domestic student population in most institutions. It remains important to determine what works to achieve Indigenous student success in higher education. This paper proposes that such methods have an integral role to play in providing a holistic view of Indigenous participation and success at university, and are particularly useful in the development and evaluation of strategies and programs. This project found no quantitative correlation between financial investment and success rate for Indigenous students. A negative correlation between access rate and success rate suggests that factors other than those that encourage participation are important in supporting successful outcomes. Those universities that have high success rates have a suite of programs to support Indigenous students, but it is not immediately clear which of these strategies and programs may be most effective to facilitate Indigenous student success rates. In this discussion, we suggest that a multi-layered determinants model is a useful way to conceptualise the many factors that may impact on student success, and how they might intersect.
There are sparse data on the outcomes of endoscopic stapling of pharyngeal pouches. The Mersey ENT Trainee Collaborative compared regional practice against published benchmarks.
Methods
A 10-year retrospective analysis of endoscopic pharyngeal pouch surgery was conducted and practice was assessed against eight standards. Comparisons were made between results from the tertiary centre and other sites.
Results
A total of 225 procedures were performed (range of 1.2–9.2 cases per centre per year). All centres achieved 90 per cent resumption of oral intake within 2 days. All centres achieved less than 2-day hospital stays. Primary success (84 per cent (i.e. abandonment of endoscopic stapling in 16 per cent)), symptom resolution (83 per cent) and recurrence rates (13 per cent) failed to meet the standard across the non-tertiary centres.
Conclusion
Endoscopic pharyngeal pouch stapling is a procedure with a low mortality and brief in-patient stay. There was significant variance in outcomes across the region. This raises the question of whether this service should become centralised and the preserve of either tertiary centres or sub-specialist practitioners.
This study aimed to identify diets with improved nutrient quality and environmental impact within the boundaries of dietary practices.
Design:
We used Data Envelopment Analysis to benchmark diets for improved adherence to food-based dietary guidelines (FBDG). We then optimised these diets for dietary preferences, nutrient quality and environmental impact. Diets were evaluated using the Nutrient Rich Diet score (NRD15.3), diet-related greenhouse gas emission (GHGE) and a diet similarity index that quantified the proportion of food intake that remained similar as compared with the observed diet.
Setting:
National dietary surveys of four European countries (Denmark, Czech Republic, Italy and France).
Subjects:
Approximately 6500 adults, aged 18–64 years.
Results:
When dietary preferences were prioritised, NRD15·3 was ~6 % higher, GHGE was ~4 % lower and ~85 % of food intake remained similar. This diet had higher amounts of fruit, vegetables and whole grains than the observed diet. When nutrient quality was prioritised, NRD15·3 was ~16 % higher, GHGE was ~3 % lower and ~72 % of food intake remained similar. This diet had higher amounts of legumes and fish and lower amounts of sweetened and alcoholic beverages. Finally, when environmental impact was prioritised, NRD15·3 was ~9 % higher, GHGE was ~21 % lower and ~73 % of food intake remained similar. In this diet, red and processed meat partly shifted to either eggs, poultry, fish or dairy.
Conclusions:
Benchmark modelling can generate diets with improved adherence to FBDG within the boundaries of dietary practices, but fully maximising health and minimising GHGE cannot be achieved simultaneously.
The chapter explains why the EU has so far failed to intervene in private fisheries governance. The chapter starts with comparing private governance schemes since the 1990s. It then analyses EU policy discussions until late 2017, showing that until very recently all involved stakeholders agreed that the fragmentation of the private governance market needed to be addressed. Differences of opinion on the desirability of publicly supporting product differentiation, however, have continued to exist. While most stakeholders consider the costs such differentiation would impose on European producers too high and therefore support procedural regulation, the European Parliament has consistently favored both standards and procedural regulations in the form of an EU-level certification and eco-labeling scheme. Attempts to create a policy failed in 2008–2009 when a legislative proposal for procedural regulation was abandoned, and in 2013 when the discussion was integrated in the reform of the Common Fisheries Policy. A 2016 report on feasible policy options, moreover, questioned the fragmentation of the private governance market, casting further doubt on the likelihood of public intervention.
Recent trends in multimedia technologies indicate the need for richer imaging modalities to increase user engagement with the content. Among other alternatives, point clouds denote a viable solution that offers an immersive content representation, as witnessed by current activities in JPEG and MPEG standardization committees. As a result of such efforts, MPEG is at the final stages of drafting an emerging standard for point cloud compression, which we consider as the state-of-the-art. In this study, the entire set of encoders that have been developed in the MPEG committee are assessed through an extensive and rigorous analysis of quality. We initially focus on the assessment of encoding configurations that have been defined by experts in MPEG for their core experiments. Then, two additional experiments are designed and carried to address some of the identified limitations of current approach. As part of the study, state-of-the-art objective quality metrics are benchmarked to assess their capability to predict visual quality of point clouds under a wide range of radically different compression artifacts. To carry the subjective evaluation experiments, a web-based renderer is developed and described. The subjective and objective quality scores along with the rendering software are made publicly available, to facilitate and promote research on the field.