- Journal List
- NIHR Open Res
- v.3; 2023
- PMC11320033
As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice
Version 1. NIHR Open Res. 2023; 3: 59.
Published online 2023 Nov 21. doi:10.3310/nihropenres.13471.1
PMCID: PMC11320033
PMID: 39139276
Vincent G Nguyen, Conceptualization, Investigation, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing,a,1 Kate Marie Lewis, Conceptualization, Investigation, Methodology, Writing – Original Draft Preparation, Writing – Review & Editing,1 Ruth Gilbert, Conceptualization, Data Curation, Funding Acquisition, Investigation, Project Administration, Resources, Supervision, Writing – Review & Editing,1 Lorraine Dearden, Conceptualization, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing,2 and Bianca De Stavola, Conceptualization, Funding Acquisition, Investigation, Methodology, Project Administration, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing1
Author information Article notes Copyright and License information PMC Disclaimer
Associated Data
- Data Availability Statement
Abstract
Introduction
One third of children in English primary schools have additional learning support called special educational needs (SEN) provision, but children born preterm are more likely to have SEN than those born at term. We aim to assess the impact of SEN provision on health and education outcomes in children grouped by gestational age at birth.
Methods
We will analyse linked administrative data for England using the Education and Child Health Insights from Linked Data (ECHILD) database. A target trial emulation approach will be used to specify data extraction from ECHILD, comparisons of interest and our analysis plan. Our target population is all children enrolled in year one of state-funded primary school in England who were born in an NHS hospital in England between 2003 and 2008, grouped by gestational age at birth (extremely preterm (24-<28 weeks), very preterm (28-<32 weeks), moderately preterm (32-<34 weeks), late preterm (34-<37 weeks) and full term (37-<42 weeks). The intervention of interest will comprise categories of SEN provision (including none) during year one (age five/six). The outcomes of interest are rates of unplanned hospital utilisation, educational attainment, and absences by the end of primary school education (year six, age 11). We will triangulate results from complementary estimation methods including the naïve estimator, multivariable regression, g-formula, inverse probability weighting, inverse probability weighting with regression adjustment and instrumental variables, along with a variety for a variety of causal contrasts (average treatment effect, overall, and on the treated/not treated).
Ethics and dissemination
We have existing research ethics approval for analyses of the ECHILD database described in this protocol. We will disseminate our findings to diverse audiences (academics, relevant government departments, service users and providers) through seminars, peer-reviewed publications, short briefing reports and infographics for non-academics (published on the study website).
Keywords: Gestational age, Intervention, Special educational needs, Trial emulation
Plain Language summary
One third of all children need extra help with learning in school, such as support from a teaching assistant. Children born preterm are more likely to need extra help compared to those born at term. In England, this help is called special educational needs (SEN) provision. The aim of this study is to find out whether special educational need provision affects education and health outcomes. We will use information collected by hospitals and schools for all children who were born in England between 2003 and 2008. We will compare those with who received and did not receive extra help in school who have a similar gestational age at birth.
Background
In the state-funded educational system in England, the system of reasonable adjustments to support children who experience difficulties learning is known as special educational needs (SEN) provision. The current version of SEN provision falls under two categories: SEN support and Educational and Health Care Plans (EHCPs) (). SEN support provides classroom-based support, such as extra help from a teacher (or assistant) or access to special learning programmes. EHCPs provide support for pupils who require more support than is available through SEN support. Due to the funding and organisational streams of SEN provision, allocation of SEN provision has been changing over time, impacted by changes in legislation, school governance structure and local authority (Liuet al., 2020). SEN provision is provided more frequently to children with health problems associated with low academic attainment such as children born preterm (Altermanet al., 2021), with congenital anomalies, such as cleft lip and palate (Fitzsimonset al., 2018), or with congenital heart defects (Glinianaiaet al., 2021). However, the potential impact of SEN provision on educational and health outcomes during primary school has not been evaluated.
Children who are born preterm (i.e. <37 weeks gestation) disproportionately experience long-term difficulties compared to their full-term peers, including lower educational outcomes (Libuyet al., 2023), higher burden of comorbidities (particularly in very premature births) (Mowitzet al., 2022) and higher contact with health services and emergency health services (Coathupet al., 2020) Increasing rates of SEN provision with earlier gestational age at birth in primary schools in England has been previously documented (Libuyet al., 2023). There are also descriptive publications showing increased hospital utilisation by gestation age (Coathupet al., 2020), and education performance by gestation age (Libuyet al., 2023). However, there is limited evidence on the impact of SEN provision on academic performance, school absences and hospital utilisation in pupils who need SEN provision.
We will emulate a pragmatic target trial study using linked administrative school and hospital records in the ECHILD database. We will separately analyse children grouped according to gestational at birth who, particularly in the most premature groups, have a similar need for SEN provision (Libuyet al., 2023). For each gestational age group, we will estimate the causal effect of SEN provision in year one of primary school on school attainment, school absences and rates of unplanned hospital admissions by the end of primary school (year six, age 10/11).
The emulated target trial aims to reduce risk of confounding and selection bias. Firstly, using known and presumed confounders of the relationship between SEN provision and our outcomes, we will evaluate the assumptions to be invoked for the estimation of causal links between them. In particular, the positivity assumption for the probability of receiving different categories of SEN provision (no SEN provision, SEN support in mainstream school, EHCP in mainstream school, special school attendance) within each gestational age group; that is that, for all combinations of covariates, there is a non-zero probability of recording each category of SEN provision. Secondly, for each gestational age group where the positivity assumption holds (Zhuet al., 2021), assuming there is no unmeasured confounding, we will estimate, and compare potential educational and health outcomes under differential treatment regimens (no SEN provision, SEN support in mainstream school, EHCP in mainstream school, special school attendance).
Methods
Patient and Public Involvement
Prior to developing this protocol, independent meetings were conducted with stakeholders (parents, pupils, teachers) from existing patient advocacy groups including the Young Person’s Advisory Group (YPAG), Council for Disabled Children’s group (FLARE) and the Great Ormond Street National Children’s Bureau Families Research Advisory Group (FRAG). On 14 November 2020, FLARE were introduced the ECHILD dataset and it’s use of linked administrative data and to the observational study design with warm reception Further meetings were held with FLAREon the 18th of September 2021 and with YPAG for research at Great Ormond Street Hospital on the 27th of November 2021. This engagement identified that school entry is an important key milestone when SEN provisions are required. Therefore, in the proposed study, we have used school start as our entry point and will generate further target trials based upon further stakeholder engagement. The Great Ormond Street Hospital for Children’s NHS Foundation Trust Young People’s Forum voiced that school absences were an important topic for research on 20 March 2021. Therefore, using these interactions, we’ve created our research question, which was presented to the HOPE study steering committee, and includes parents of children with disabilities who will review and advise the on the presentation and dissemination of the study findings. Recordsand learnings from public engagements can be foundhere.
Study design
Trial emulation framework applied to observational educational data linked to healthcare data. Analyses will be conducted in the Office for National Statistics Secure Research Service usingStata 17 andR version 4.0.2 (open source, free software). Once written, the code for the study, including algorithms to identify the population, exposure, outcomes, and confounders, will be made publicly available on publication of the full manuscript.
Data source and linkage
We will use the ECHILD database, a pseudo-anonymised dataset that links Hospital Episode Statistics (HES) with the National Pupil Database (NPD). A linkage rate of 95% has been reported between NPD and HES in ECHILD, with high linkage rates attributed to a two-stage linkage process (Libuyet al., 2021).
In brief, the ECHILD's extract of NPD contains pupil-level data from state schools in England for academic terms between 2006 and 2020 (Mc Grath-Loneet al., 2022). This includes school, local authority, age, gender, ethnicity, first language, socioeconomic status, free school meal status, recorded absences, social care/children in need related data and SEN status. In addition to the NPD, school level characteristics such as school type (including special or mainstream), school rating, and governance are available through the Department for Education’s opensource ‘Get Information about Schools’ (GIAS) register, and linkable to ECHILD using the school’s unique reference number (GOV.UK, 2022).
The ECHILD’s extract of HES contains details on admitted patient care, outpatient appointments, accident and emergency utilisation, and critical care between 1997 until 2021. It contains details on admission and discharge dates, patient characteristics (e.g., sex, ethnicity, area of residence) and clinical information recorded during hospital admissions (such as, details of diagnoses and operations). HES covers 99% of public hospital activity in England (Herbertet al., 2017). HES also contains birth records which record characteristics such as gestational age, birthweight, maternal age; missingness in an individual’s birth record can be complemented using the corresponding mother’s delivery record. Furthermore, since 1998, HES records are also linked to ONS Mortality data covering information on mortality causes and timing of deaths.
Further details of the ECHILD dataset are documented byMc Grath-Loneet al., 2022.
Population and follow-up
Our population is singleton children who were born in NHS-funded hospitals in England between 1 September 2003 and 31 August 2008 and were enrolled in year one of a state-funded primary school in England at age five/six years (seeFigure 1). Children will be excluded if they do not have complete information on gestational age. Child will also be excluded from analyses of educational outcomes if they have missing data on the early years foundation stage profile (in reception, age four/five). We will also exclude children with a gestational age of <24 or >44 weeks or those with implausible gestational ages based on birthweight because of a high risk of misclassification. Comparisons of included and excluded children will help to inform whether there are issues of selection bias. This population was chosen as these children can be followed up to the end of primary school (year six, age 10/11) in ECHILD, with the latest academic year of follow up before the COVID-19 pandemic.
Figure 1.
Expected age at entry into primary school years one to six, by birth year and follow-up year; Y = year;adefined according to the academic calendar (i.e., 2003/04 includes 1 September 2003 to 31 August 2004, inclusive.
The study population will be followed-up from the January census in year one (age five/six) until the first chronological event of: end of primary school (year six, age 11 at exit), lost to follow-up or end of study (30th July 2019). Children will be considered lost to follow-up if they no longer appear in any NPD school census; this may be due to transfer to a non-government funded school or alternative provision, off-rolling (where pupils are illegally excluded from school) (Jayet al., 2022), emigration or death. We begin follow up in year one rather than reception (the first year of primary school in England) as it is the first full school year when education is compulsory for all children. We use the January census (rather than the October census) to allow for time for pupils to be assigned SEN provision.
Subgroups
We expect the impact of SEN to vary according to the child’s need for SEN, which is correlated with decreasing gestational age at birth (Libuyet al., 2023). We will therefore conduct all analyses separately for five subgroups, defined by completed weeks gestation at birth: extremely preterm (24-<28 weeks); very preterm (28-<32 weeks); moderately preterm (32-<34 weeks); late preterm (34-<37 weeks); full term (37 to <42 weeks) (ONS, 2015).
Intervention variable
Our intervention consists of four categories of recorded SEN provision in the January census of year one of school: none; SEN support (previously known as School Action/School Action Plus) at mainstream school; EHCP (previously known as statement of SEN) at mainstream school; and special school attendance (where the vast majority of children have an EHCP). Whilst SEN provision can change throughout a child’s educational journey, our implementation of trial emulation focusses on an observational-analogue of intention-to-treat analysis (ITT) of SEN at the start of compulsory education. This analyses the assignment of treatment and not whether treatment was adhered to or provided. We choose the start of compulsory education as we believe this is a population in need of SEN provision from the start of their educational journey based upon prior evidence of educational (Libuyet al., 2023) and healthcare needs (Coathupet al., 2020).
Outcome variables
We will evaluate both health and educational outcomes.
For health outcomes, we will evaluate unplanned hospital utilisation, consisting of the number of unplanned admissions to hospital (defined by the admission method in the first episode of care) and contacts with an accident and emergency departments between January of start of year one (age five/six) and at the end of year six (age 11) (Harronet al., 2018).
For educational outcomes, we will evaluate key stage two English and mathematics assessments (taken in Year six, at ages 10/11), including whether assessments are taken (yes or no) and, if taken, attainment in the assessments. To account for time-varying changes in recording of educational outcomes, we will use standardised scores within academic year.
We will also evaluate the number of absences during primary school (January year one to the end of year six) including unauthorised absences and absences related to illness and dental or medical appointments.
Covariates
To account for determinants of SEN provision assignment in children with similar gestational ages, we will use information on covariates known or suspected to influence (or be associated with) SEN provision based upon prior literature (Coathupet al., 2020;Hutchinson, 2021;Libuyet al., 2023).Table 1 shows our preliminary list of sociodemographic, educational and health related covariates which are related to SEN provision and both educational and health related outcomes. We will use DAGitty version 3.0, an open-source piece of software create directed acyclic graphs (DAGs) to guide our selection of variable adjustment set to reduce the risk of unaccounted confounding, overadjustment and potentially mediating away any true effects.
Table 1.
Potential confounders.
Covariate Group | Covariate | Categories of measurement | Source |
---|---|---|---|
Clinical | Biological sex | Female Male Unknown (depending on numbers) | HES |
Major congenital anomaly | Presence of congenital anomaly (yes or no), based on the Hardelid UK chronic condition ICD-10 code list identified in infant hospital admissions up to age 2 (Hardelidet al., 2014) | HES | |
Prior unplanned hospitalisation usage before year one of school | Number of days in which a child is recorded as attending an accident and emergency department or admitted to hospital in an emergency adjusted for person-time | HES | |
Education | Early years foundations stage profile (English and mathematics score) | Standardised z-score for English and mathematics within academic year | NPD |
School Governance Type | Local authority managed Academy Other | GIAS | |
School Type | Mainstream Special Alternative Provision Pupil Referal Unit | GIAS | |
Pupil Teacher Ratio | Ratio depicting the number of pupils per teacher in the school | GIAS | |
Socio- demographic | Child's ethnic group | Asian, Black, Mixed or multiple ethnic groups, White, other | NPD |
Maternal age at birth | Continuous values between 10–60. We will censor ages below 10 and above 60 because of a high risk of misclassification | HES | |
Free school meal | Eligible for free school meals Not eligible for free school meals | NPD | |
Month of birth | January to December | HES and NPD must match | |
Deprivation at birth | IMD deciles | HES | |
Deprivation at start of school | IDACI quintiles | NPD | |
English as a first language | Recorded as English Not recorded as English Unknown | NPD |
Open in a separate window
HES = hospital episodes statistics, GIAS = get information about schools, NPD = national pupil database
Bias
To reduce confounding and other sources of bias impacting data collected outside of a randomised controlled trial setting, we will adopt the Target Trial Emulation (TTE) framework (Hernánet al., 2022). TTE maps observational data to a hypothetical target experimental trial counterpart by creating the specification of an ideal (pragmatic) trial and using this as a basis to shape the observational study design. TTE consists of firstly, defining the specifications of a hypothetical, ideal experimental trial of the causal question of interest (including the corresponding causal contrast), secondly, emulating the specifications of the ideal target trial using observational data and thirdly, estimating the effects of interest using the emulated trial data. The first component of TTE includes defining an inclusion/exclusion criterion on entry, a treatment strategy (including time of assignment and entry), follow-up frequency and modality, outcome measures, causal contrasts of interest and the analytical estimation methods for an ideal trial. Using the second component of TTE, observational data are wrangled to emulate the distribution of the data if it were to have been gathered prospectively in the ideal trial. Finally, the third component of TTE requires using methods to adjust for known and suspected confounding. InTable 2, we describe the ideal target trial that would be designed to investigate the causal effect of SEN provision (by the upcoming January Census in the first year of compulsory education) on the relevant outcomes and the equivalent emulated trial to be generated from ECHILD.
Table 2.
Summary protocol: target trial emulation to evaluate the effect of SEN provision on assessment scores and unplanned hospital contacts.
Protocol component | Target trial specification | Emulation study | Potential challenges and possible solutions |
---|---|---|---|
Eligibility criteria | Born in England between 1 September 2003 and 31 August 2008. Started year one in a state-funded Taken part in the EYFSP assessments | Born in England between 2003 and 2007 with gestational age recorded in birth/delivery record. Linked HES-NPD records. Recorded start of year one between 2009/10 EYFSP assessment is recorded. | Based upon prior experience of using these administrative data, we expect some children appear in year one twice - these will be removed due to uncertainties about the reliability of these data; not all pupils have EYSFP and teacher strikes are expected to reduce key stage 2 assessment availability - missingness patterns will be examined and when a MAR assumption is defensible, multiple imputation will be performed and then the potential selection bias of an incorrect MAR assumption evaluated in sensitivity analyses |
Study design | Randomised controlled trial | Trial emulation framework applied to linked observational hospital-school data | Potential residual or uncontrolled confounding by indication |
Data structure | Prospective data collection as part of the randomised controlled trial | Retrospective wrangling of administrative data leading to prospective information | Missingness patterns will be examined and when a MAR assumption is defensible, multiple imputation will be performed and then the potential selection bias of an incorrect MAR assumption evaluated in sensitivity analyses |
Outcome | •Key stage 2 assessments •School absences (unauthorised and health related) •Unplanned hospital utilisation | •Key stage 2 assessments •School absences (unauthorised and health related) •Unplanned hospital utilisation | Teacher strikes are expected to reduce key stage 2 assessment availability – missingness patterns will be examined as outlined above |
Treatments to be compared | Categories of SEN provision: none, SEN support in mainstream school, EHCP in mainstream school, special school attendance | Categories of SEN provision where there exists pairwise common support: none, SEN support in mainstream school, EHCP in mainstream school and special school attendance | As there may be a delay in applying for SEN provision; we will consider a sensitivity analysis where our treatment assignment will be by year two instead of year one |
Causal contrasts | Intention to treat for SEN provision assignment in the first full year of compulsory education (year one, age five on entry), with none as the reference category | Observational analogue of the intention to treat for SEN provision as recorded in the first full year of compulsory education (year one, age five on entry). Additionally, the average treatment effect in the treated; the average treatment effect in the non- treated (see definitions inTable 4) | |
Analysis plan | Logistic and linear regression for key stage 2 results Poisson (or negative binomial) | Educational outcomes - logistic and linear regression modelling with appropriate control for confounding adjustment and standardisation (such as regression adjustment and standardisation, propensity score-based methods). Clustering by school and/or local authority to be dealt with using either mixed effects models or robust inference (e.g., generalised estimating equations). Health outcomes - Poisson (or negative binomial) regression models with appropriate control for confounding, followed by standardisation. |
Open in a separate window
EHCP= education, health and care plan; EYSFP=Early years foundation stage profile; HES=hospital episode statistics; LA=local authority; MAR=missing at random; MCA=Major congenital anomalies; MSOA=Middle Layer Super Output Area; NPD=national pupil database; SEN=special educational needs
Statistical analysis
Data wrangling. Based upon the proportion and mechanisms of missingness in the data, we will first use future recordings to complement missing baseline covariates such as gender; secondly, we will complement non-missing data in HES and NPD prior to data imputation; for example, using sex variable from HES to complement missing values in the NPD variable gender (Azuret al., 2011).
Exploratory Analysis. We will first analyse the feasibility counts of the ECHILD data, including gestational age subgroups, the distribution of variables including our exposure (SEN provision) and confounders (Table 1). This will include assessing the feasibility of including children attending alternative provision (including pupil referral units) in our eligibility criteria and follow up; these groups are assumed to have small numbers and hence, their inclusion, may pose violations the positivity assumption.
To understand whether there are violations of the positivity assumption (i.e., whether pupils who are recorded to be requiring different categories of SEN provision are comparable), we will calculate and compare the propensity score distributions for each SEN category within each gestational age group. We will compare the density distribution between each pairwise of groups (Rassenet al., 2013), for example, noneversus SEN support in mainstream school, SEN supportversus EHCP in mainstream school, noneversus EHCP in mainstream and so on. Propensity scores for each SEN provision category will be estimated using logistic regression; to assess their robustness, binary machine learning predictors of each SEN provision category, such as tree-based algorithms, will be used and the resulting propensity scores compared to those obtained when using logistic regression (Lee Brianet al., 2009).
Causal inference. Our causal analyses will be conducted for pairs of interventions where the causal assumptions of non-interference, consistency, positivity, and conditional exchangeability are assumed to hold (Hernán, 2012) (seeTable 3). For health outcomes and school absences (which are count data) and educational outcomes (which are continuous variables), we aim to triangulate results from three groups of methods: methods traditionally used in epidemiology, methods that rely on the no-unmeasured confounders assumption and, if possible, methods that exploit instrumental variables or difference in difference methods.
Table 3.
Identifiability assumptions.
Identifiability assumption | Application to this study | Testing the assumption: can we meet it? |
---|---|---|
No interference: an individual’s (or unit’s) potential outcome does not depend on other individuals’ (units’) treatment assignment, Yi (T1, T2,…,Tn) = Yi(Ti), where Y(t) is the potential outcome when the intervention T is set to take the value t | Key stage two results/ the number of hospital contacts/absences do/does not dependent on whether other children receive SEN provision. | Theoretical; we suspect there is residual interference, given the nature of SEN provision (e.g., learning support assistants) in the classroom setting – therefore estimates of the ATE could be biased because of a spill- over effect. We expect the AT(N)T to be less likely impacted by interference. |
Consistency: The intervention is well-defined and corresponds to what is captured in the data. Put another way, the exposure definition must have enough precision that any variation in that exposure does not lead to a different outcome: Y(t1) = Y(t2), if t1 and t2 are different version of the intervention. | We assume that the potential outcome for a given category of SEN provision is the same for all children, even if that provision is delivered differently. Also assumes that school-recorded SEN provision is a good proxy forreceipt of SEN provision. | If this assumption is not defensible, we will interpret E[Y (0)] and E[Y(1)] as the averages of the various potential outcomes that would arise from the multiple versions of the exposure seen in the data. |
Positivity: all individuals have a probability greater than 0 (a positive probability) of being assigned each value of the intervention, in every stratum defined by the covariates used to control for confounding, C, that allow for the conditional exchangeability assumption to be met, i.e., the confounders. 0<P (Ti | Ci) <1 for all Ci | That, when studying educational/health outcomes, there is a non-zero probability of receiving any of the categories of SEN provision, given the relevant confounders. | We will examine propensity score overlap between pairwise comparison groups and limit the analyses to comparisons where common support is found. |
Conditional exchangeability: The assignment mechanism is unrelated to potential outcomes, conditional on covariates, Y (0), Y (1) ⊥⊥ T|C, where T is the intervention, C the covariates, and ⊥⊥ indicates “independence” | After controlling for covariates, individuals in different intervention groups have similar characteristics, i.e., are exchangeable. | Yes, if the correct confounding adjustment set is identified and adjusted for. Alternative estimation methods that do not rely on these assumptions and that target the same causal contrasts will be also pursued (e.g., exploiting IVersus), with results compared. |
Open in a separate window
C=covariates/controls; E=expectation; i=units; P=probability; SEN=special educational needs; T=treatment (or intervention); Y(t)=potential outcome; Y(0)=potential outcome when untreated; Y(1)=potential outcome when treated; IV=instrumental variable
Our first group of methods will implement the naïve and adjusted estimators using general linear models as part of our traditional epidemiological estimates including Poisson based link functions (with the logarithm follow-up time as an offset) for counts of individual health outcomes and absences, and linear link functions for individual educational scores (Arnoldet al., 2021). The second group of methods includes outcome-based methods which rely on the no-unmeasured confounding assumption and expand on traditional epidemiological methods by focussing on marginalising results over the population using models such as the parametric g-formula, inverse probability weighting, and inverse probability weighting using regression adjustment (Smithet al., 2022). With these methods inference will be based upon bootstrapping. For both health and educational outcomes, we will calculate and compare the following causal contrasts: observational analogue of the ITT, the overall average treatment effect, the average treatment effect in the treated and the average treatment effect the not treated (seeTable 4 for definitions).
Table 4.
Comparison of causal contrasts.
Causal Contrast | Formal definition | Educational outcome causal question | Health outcome causal question |
---|---|---|---|
Average Treatment Effect, (in this setting same as the ITT) | ATE(w) =E(Y(T=1) |W=w)-E(Y(T=0) |W=w) | What would be the difference in the average assessment score at key stage two (KS2) if all children born at gestational age w were and were not set to receive SEN provision? What would be the average difference | What would be the difference in the rate of unplanned hospital admissions if all children born at gestational agew were and were not set to receive SEN provision? |
Average Treatment Effect in the Treated | ATT(w) =E(Y(T=1) |T=1,W=w)- E(Y(T=0) |T=1,W=w) | What would the difference in average assessment score at KS2 be if all children born at gestational agew who received SEN provision, had not received SEN provision? What would be the average difference | What would the difference in the rate of unplanned hospital admissions be if all children born at gestational age w who received SEN provision, had not received SEN provision? |
Average Treatment Effect in the not treated | ATNT(w) =E(Y(T=1)|T=0,W=w)- E(Y(T=0)|T=0,W=w), | What would the difference in the average assessment score at KS2 be if all children born at gestational age w who did not receive SEN provision, had instead received SEN provision? | What would the difference in the rate of unplanned hospital admissions be if all children born at gestational agew who did not receive SEN provision, had instead received SEN provision? |
Open in a separate window
ITT=intention to treat; T=SEN provision (1 alternative category of SEN provision, 0 reference category); SEN=special educational needs;W=week of gestational age, taking a value between 24 and 42 (w)
The third group of methods includes instrumental variable and difference-in-difference methods and are only suitable if instruments for SEN provision are identified, for example if there are policy changes in provision that are implemented at different times across local authorities (Greenland, 2018). These would lead to estimate (under the assumption of individual hom*ogeneity of effects) the observational analogue of the ITT. Related to these are difference-in-difference based methods that to estimate group differences against predicted trajectories between different groups of recorded SEN provision, leading to estimating the ATT (Richardsonet al., 2023). SeeTable 4 for the research these causal contrasts are addressing.
Missing data. To deal with missing covariate values (there are no missing exposure data by design) we will use Imputation using Chained Equations (ICE) as part of the bootstrap-based estimation of confidence intervals of point estimates, we will use in each replicant as part of bootstrap imputation (). All variables will be used to predict missing data including the exposure and the outcome, and any other variables assumed to be informative of the missing values (Azuret al., 2011).
Sensitivity analyses. We aim to conduct a series of sensitivity analyses to estimate the robustness of our results. Firstly, we will adjust our assignment of recorded SEN provision from year one to year two to account for the administrative time it takes for parents/carers to apply for SEN provision. One of our criteria is that pupils must have data on their EYFSP school readiness tests as this is a major confounding variable; this may restrict our population to those able to take the test. Hence, to account for this non-participation, we will use a missingness indicator to capture the information held in missing the test and avoid excluding those without a record (Groenwoldet al., 2012). Furthermore, we suspect there maybe missingness in outcome data, particularly for key stage two scores based upon prior knowledge of systematic teacher strikes; in such cases we will use imputation to estimate these key stage two outcomes using year of testing in the imputation model. Finally, we propose analysing the correlation between recorded child sex (reported by physician in HES) and gender (submitted by parent/carer during school registration in NPD). To understand the validity of our models, we will produce a table of how using either variable impacts our point estimates of the intervention variable only.
Ethics
Permissions to use de-identified data and linked from Hospital Episode Statistics and the National Pupil Database were granted by DfE (DR200604.02B) and NHS Digital (DARS-NIC-381972); consent from patients is not required for HES as the data provided by NHS Digital is pseudo-anonymised and reduces identifiability to researchers; further information on opting out of Hospital Episode Statistics for secondary usage can be foundhere. Ethical approval for the ECHILD project was granted by the National Research Ethics Service (17/LO/1494), NHS Health Research Authority Research Ethics Committee (20/EE/0180) and UCL Great Ormond Street Institute of Child Health’s Joint Research and Development Office (20PE06).
Acknowledgments
We gratefully acknowledge all children and families whose de-identified data are used in this research. We would like to acknowledge the contribution of the wider HOPE study team to this work: Sarah Barnes, Kate Boddy, Kristine Black-Hawkins, Lorraine Dearden, Tamsin Ford, Katie Harron, Lucy Karwatowska, Matthew Lilliman, Stuart Logan, Jacob Matthews, Jugnoo Rahi, Jennifer Saxton, Isaac Winterburn and Ania Zylbersztejn. We thank Ruth Blackburn, Matthew Jay, Farzan Ramzan, and Antony Stone for ECHILD database support.
Notes
[version 1; peer review: 1 approved, 2 approved with reservations]
Funding Statement
This project is funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research Programme (Grant Reference Number NIHR202025). RG is supported by a NIHR Senior Investigator award. ECHILD is supported by ADR UK (Administrative Data Research UK), an Economic and Social Research Council (part of UK Research and Innovation) programme (Grant Reference Numbers ES/V000977/1, ES/X003663/1, ES/X000427/1). Research at UCL Great Ormond Street Institute of Child Health is supported by the NIHR Great Ormond Street Hospital Biomedical Research Centre.The ECHILD Database uses data from the Department for Education (DfE). The DfE does not accept responsibility for any inferences or conclusions derived by the authors. This work contains statistical data from ONS which is Crown Copyright. The use of the ONS statistical data in this work does not imply the endorsem*nt of the ONS in relation to the interpretation or analysis of the statistical data. This work uses research datasets which may not exactly reproduce National Statistics aggregates. This research contributes to but was not commissioned by the NIHR Policy Research Programme. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data availability
No data are associated with this article.
References
- Alterman N, Johnson S, Carson C, et al.:Gestational age at birth and child special educational needs: a UK representative birth cohort study.Arch Dis Child.2021;106(9):842–848. 10.1136/archdischild-2020-320213 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Arnold KF, Davies V, de Kamps M, et al.:Reflection on modern methods: generalized linear models for prognosis and intervention-theory, practice and implications for machine learning.Int J Epidemiol.2021;49(6):2074–2082. 10.1093/ije/dyaa049 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Azur MJ, Stuart EA, Frangakis C, et al.:Multiple imputation by chained equations: what is it and how does it work?Int J Methods Psychiatr Res.2011;20(1):40–49. 10.1002/mpr.329 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Coathup V, Boyle E, Carson C, et al.:Gestational age and hospital admissions during childhood: population based, record linkage study in England (TIGAR study).BMJ.2020;371:m4075. 10.1136/bmj.m4075 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Fitzsimons KJ, Copley LP, Setakis E, et al.:Early academic achievement in children with isolated clefts: a population-based study in England.Arch Dis Child.2018;103(4):356–362. 10.1136/archdischild-2017-313777 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Glinianaia SV, McLean A, Moffat M, et al.:Academic achievement and needs of school‐aged children born with selected congenital anomalies: A systematic review and meta‐analysis.Birth Defects Res.2021;113(20):1431–1462. 10.1002/bdr2.1961 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- GOV.UK:Get Information about Schools.2022; (Accessed: 11 July 2022).Reference Source
- Greenland S:An introduction to instrumental variables for epidemiologists.Int J Epidemiol.2018;47(1):358. 10.1093/ije/dyx275 [PubMed] [CrossRef] [Google Scholar]
- Groenwold RHH, White IR, Donders ART, et al.:Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis.CMAJ.2012;184(11):1265–1269. 10.1503/cmaj.110977 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Hardelid P, Dattani N, Gilbert R, et al.:Estimating the prevalence of chronic conditions in children who die in England, Scotland and Wales: a data linkage cohort study.BMJ Open.2014;4(8):e005331. 10.1136/bmjopen-2014-005331 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Harron K, Gilbert R, Cromwell D, et al.:International comparison of emergency hospital use for infants: data linkage cohort study in Canada and England.BMJ Qual Saf.2018;27(1):31–39. 10.1136/bmjqs-2016-006253 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Herbert A, Wijlaars L, Zylbersztejn A, et al.:Data Resource Profile: Hospital Episode Statistics Admitted Patient Care (HES APC).Int J Epidemiol.2017;46(4):1093–1093i. 10.1093/ije/dyx015 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Hernán MA:Beyond exchangeability: The other conditions for causal inference in medical research.Stat Methods Med Res.2012;21(1):3–5. 10.1177/0962280211398037 [PubMed] [CrossRef] [Google Scholar]
- Hernán MA, Wang W, Leaf DE:Target Trial Emulation: A Framework for Causal Inference From Observational Data.JAMA.2022;328(24):2446–2447. 10.1001/jama.2022.21383 [PubMed] [CrossRef] [Google Scholar]
- Hutchinson J:Identifying Pupils with Special Educational Needs and Disabilities.2021.Reference Source
- Jay MA, Grath-Lone LM, De Stavola B, et al.:Evaluation of pushing out of children from all English state schools: Administrative data cohort study of children receiving social care and their peers.Child Abuse Negl.2022;127:105582. 10.1016/j.chiabu.2022.105582 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Lee BK, Lessler J, Stuart EA:Improving propensity score weighting using machine learning.Stat Med.2009;29(3):337–346. 10.1002/sim.3782 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Libuy N, Gilbert R, Mc Grath-Lone L, et al.:Gestational age at birth, chronic conditions and school outcomes: a population-based data linkage study of children born in England.Int J Epidemiol.2023;52(1):132–143. 10.1093/ije/dyac105 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Libuy N, Harron K, Gilbert R, et al.:Linking education and hospital data in England: linkage process and quality.Int J Popul Data Sci.2021;6(1):1671. 10.23889/ijpds.v6i1.1671 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Liu Y, Bessudnov A, Black A, et al.:School autonomy and educational inclusion of children with special needs: Evidence from England.Br Educ Res J.2020;46(3):532–552. 10.1002/berj.3593 [CrossRef] [Google Scholar]
- Long R, Danechi S:Special Educational Needs: Support in England.House of Commons,2023.Reference Source
- Mc Grath-Lone L, Libuy N, Harron K, et al.:Data Resource Profile: The Education and Child Health Insights from Linked Data (ECHILD) Database.Int J Epidemiol.2022;51(1):17–17f. 10.1093/ije/dyab149 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Mowitz ME, Gao W, Sipsma H, et al.:Burden of Comorbidities and Healthcare Resource Utilization Among Medicaid-Enrolled Extremely Premature Infants.J Health Econ Outcomes Res.2022;9(2):147–155. 10.36469/001c.38847 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- ONS:Pregnancy and ethnic factors influencing births and infant mortality: 2013. 2015.Reference Source
- Rassen JA, Shelat AA, Franklin JM, et al.:Matching by Propensity Score in Cohort Studies with Three Treatment Groups.Epidemiology.2013;24(3):401–409. 10.1097/EDE.0b013e318289dedf [PubMed] [CrossRef] [Google Scholar]
- Richardson DB, Ye T, Tchetgen Tchetgen EJ:Generalized Difference-in-Differences.Epidemiology.2023;34(2):167–174. 10.1097/EDE.0000000000001568 [PubMed] [CrossRef] [Google Scholar]
- Schomaker M, Heumann C:Bootstrap inference when using multiple imputation.Stat Med.2018;37(14):2252–2266. 10.1002/sim.7654 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Smith MJ, Mansournia MA, Maringe C, et al.:Introduction to computational causal inference using reproducible Stata, R, and Python code: A tutorial.Stat Med.2022;41(2):407–432. 10.1002/sim.9234 [PubMed] [CrossRef] [Google Scholar]
- Zhu Y, Hubbard RA, Chubak J, et al.:Core concepts in pharmacoepidemiology: Violations of the positivity assumption in the causal analysis of observational data: Consequences and statistical approaches.Pharmacoepidemiol Drug Saf.2021;30(11):1471–1485. 10.1002/pds.5338 [PMC free article] [PubMed] [CrossRef] [Google Scholar]
- Version 1. NIHR Open Res. 2023; 3: 59. »
- Reviewer response for version 1
2023; 3: 59.
Published online 2023 Nov 21. doi:10.3310/nihropenres.14616.r30748
Chary Akmyradov, Referee1
Author information Copyright and License information PMC Disclaimer
This study protocol is centered on assessing the effects of Special Educational Needs (SEN) support on the health and educational outcomes of English primary school students, with a particular emphasis on their birth gestational ages. Utilizing the ECHILD database, the study will examine children born in NHS hospitals from 2003 to 2008. The objective is to determine the influence of SEN support on factors such as unplanned hospital visits, academic performance, and attendance rates. Employing a trial emulation approach, the research will analyze observational data linked to healthcare information, covering students from their first through sixth years in primary school. The study involves comprehensive statistical analysis and multiple sensitivity tests to verify the reliability of the findings. This research is crucial in gauging the effectiveness of SEN support in English primary schools, notably its varying impact based on the children's gestational ages at birth.
The protocol provides a clear rationale for the study, highlighting the need to understand the impact of SEN provision in a nuanced way, considering the gestational age of children. It also clearly outlines its objectives, focusing on a range of significant health and educational outcomes. This clarity in the rationale and objectives ensures that the study is targeted and relevant to the needs of children requiring SEN provision.
The study design described in the article seems appropriate for addressing the research question regarding the impact of Special Educational Needs (SEN) provision on health and educational outcomes in English primary school children, particularly in relation to their gestational age at birth. Here's why:
Use of the ECHILD Database: The study's reliance on the Education and Child Health Insights from Linked Data (ECHILD) database is suitable as it provides comprehensive, linked administrative data. This database allows for a robust analysis of educational and health outcomes across a large population of children.
Focus on a Specific Cohort: By concentrating on children born in NHS hospitals between 2003 and 2008 and following them from the first to the sixth year of primary education, the study can closely monitor and analyze the long-term effects of SEN provisions.
Trial Emulation Framework: The use of a trial emulation approach, which uses observational data in a manner similar to a clinical trial, is an innovative method. It can effectively assess causal relationships in situations where randomized controlled trials are not feasible or ethical.
Consideration of Gestational Age: Stratifying children based on their gestational age at birth is a critical factor, as preterm birth can significantly influence the need for SEN provision and health outcomes. This stratification helps in understanding the variability in the impact of SEN provisions.
Comprehensive Outcomes Analysis: The study's focus on a range of outcomes, including unplanned hospital utilization, educational attainment, and school absences, provides a holistic view of the impact of SEN provisions.
Statistical Analysis and Sensitivity Analyses: The plan to use multiple statistical methods and conduct sensitivity analyses suggests a thorough approach to data analysis, enhancing the reliability and validity of the findings.
The details provided in the methods section appears sufficient to allow replication. However, (not for the protocol) for complete replication, more detailed information on certain aspects such as the exact statistical models, data cleaning and processing procedures, specific definitions of SEN provision categories, and detailed criteria for subgroup classifications would be necessary in the supplementary materials of final manuscript. These details could be described in the protocol.
Data sources are described detailly without presenting actual data records.
Is the study design appropriate for the research question?
Yes
Is the rationale for, and objectives of, the study clearly described?
Yes
Are sufficient details of the methods provided to allow replication by others?
Yes
Are the datasets clearly presented in a useable and accessible format?
Not applicable
Reviewer Expertise:
Biostatistics with focus neonatology, cardiology, and educational outcomes analyses.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
- Version 1. NIHR Open Res. 2023; 3: 59. »
- Reviewer response for version 1
2023; 3: 59.
Published online 2023 Nov 21. doi:10.3310/nihropenres.14616.r31303
Uttara Partap, Referee1
Author information Copyright and License information PMC Disclaimer
This manuscript describes the protocol for an analysis using a target trial emulation approach to assess the affect of special education needs (SEN) provision on key health and educational indicators, stratified by gestational age at birth, among children in English primary schools who were born in an NHS England hospital between 2003 and 2008. The trial uses data from the ECHILD database, which contains linked administrative school and hospital records. The protocol describes an important study which has the potential to advance our understanding of the effect of SEN provision on key indicators, and whether this differs by gestational age at birth. The manuscript is detailed and well written. I have a few points for consideration for the authors:
In the Background, it is mentioned that there are two categories of SEN provision: SEN support and Educational and Health Care Plans (EHCP). However, in the Methods/Intervention variable, the authors note that the intervention consists of four categories of SEN provision: none, SEN support, EHCP at a mainstream school, and special school attendance where the majority of children have an EHCP.
Perhaps in the Background, it would be good to also split the categories as in the methods (e.g. mention that EHCP could be delivered in a mainstream school or a special school)
It might be helpful to quantify the “vast majority” of children having EHCP, to understand how hom*ogenous the intervention is in that category
Table 2 row “Treatments to be compared” outlines the last category as “special school attendance”, which adds a little bit to the confusion. It would be good to ensure that there is consistency and clarity in the definition.
Could the authors outline a little bit the choice of focus on singleton children for the analysis – what is the rationale for excluding those who were from multiple gestations? It would most likely require a distinct analysis, but given that multiple gestations are often a risk factor for preterm birth, I wonder whether a similar analysis focusing on this may also be important.
Methods: the authors note that they are excluding children with a gestational age of <24 to >44 weeks, but from the Subgroups subsection, it appears that children born post term (>=42 weeks GA) are also excluded. It may be good to clarify this and check consistency.
Under the sub-heading for Intervention variable, the authors note that an approach analogous to intention-to-treat will be used which focuses on treatment assignment rather than delivery or adherence. One question/thought may be though to examine and check that the duration of exposure to intervention is balanced across arms/categories – I wonder whether the authors have any thoughts on this.
Minor comments
The aim of the analysis is clear in the Abstract, but not so much in the Background. Re-iterating the aim in a single statement in the Background may be helpful to anchor the rest of the manuscript.
Table 1 footnotes – may be good to expand IMD, IDACI, ICD.
Is the study design appropriate for the research question?
Yes
Is the rationale for, and objectives of, the study clearly described?
Yes
Are sufficient details of the methods provided to allow replication by others?
Partly
Are the datasets clearly presented in a useable and accessible format?
Not applicable
Reviewer Expertise:
Maternal and child health, epidemiology, adolescent health, nutrition
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
- Version 1. NIHR Open Res. 2023; 3: 59. »
- Reviewer response for version 1
2023; 3: 59.
Published online 2023 Nov 21. doi:10.3310/nihropenres.14616.r30743
Neora Alterman, Referee1
Author information Copyright and License information PMC Disclaimer
This is a study protocol for an investigation of the causal impact of special educational needs provision on academic and health outcomes stratified by gestational age at birth. The protocol is well written and thought through, yet I have several comments.
The abstract states that one third of children in English primary schools have SEN provision but this is not mentioned in the protocol text itself. Please also add a reference.
The background section describes SEN provision in England and the negative relationship between gestational age at birth and SEN. Please explain in further detail why gestational age is expected to be a modifier of the effect of SEN provision on academic outcomes and hospital utilization. Is gestational age a surrogate for the indication for special educational needs provision?
The rationale and potential mechanism of the impact of SEN provision towards academic outcomes and school absences is straightforward. However, further clarification about the rationale behind the possible impact of SEN provision on unplanned hospital admissions would be beneficial.
The background section states that there is limited evidence on the impact of SEN provision on academic performance, school absences and hospital utilisation. Are there no studies examining the impact of SEN on these outcomes using trials, natural experiments, or observational causal inference methods? If so, this requires clarification.
The final sentence in the methods section of the abstract has ‘a variety’ twice.
The authors may want to categorize the subgroup ‘term’ gestational age group in a more refined manner according to the categorization of the American College of Obstetricians and Gyneocologists (ACOG) Ref [1]. This includes ‘early term’ (weeks 37-38), ‘full term’ 39-40, ‘late term’ (41) and ‘post term’ (42 and above). The early term group weeks was shown to have higher rates of inpatient hospital admissions Khantzian EJ et al Ref[2] and worse results in Key Stage 1 SATs compared with week 40 Lewis Carl et al Ref[3]. The late term and post term groups have poorer clinical outcomes and for different reasons than preterm birth and it may thus be worthwhile to examine them separately.Please note that there is also inconsistency in the inclusion group of the late pregnancy births. Children with a gestational age >44 are said to be excluded, but the remaining post term births are not included in any of the subgroups.
In Table 2 the authors address how they plan to investigate the eligibility criteria of the emulation study compared with the target trial. The possibility that several of the planned restrictions and exclusions might lead to bias should be discussed. These include exclusion of children with missing gestational age in HES data and restriction on singleton births (likely necessary due to challenge in linking twins). Collider bias arising from the inevitable restriction on state school only should be discussed as well, since a child’s need for special education may affect the parents’ choice to enrol to a state or private school.
Lastly,it would be helpful to add details about the EChild database, such as number of children included.
Is the study design appropriate for the research question?
Yes
Is the rationale for, and objectives of, the study clearly described?
Partly
Are sufficient details of the methods provided to allow replication by others?
Partly
Are the datasets clearly presented in a useable and accessible format?
Yes
Reviewer Expertise:
Perinatal epidemiologist
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
References
1. ACOG Committee Opinion No 579: Definition of term pregnancy.Obstet Gynecol.2013;122(5) :10.1097/01.AOG.0000437385.88715.4a1139-1140 10.1097/01.AOG.0000437385.88715.4a [PubMed] [CrossRef] [Google Scholar]
2. :The self-medication hypothesis of substance use disorders: a reconsideration and recent applications.Harv Rev Psychiatry.1997;4(5) :10.3109/10673229709030550231-44 10.3109/10673229709030550 [PubMed] [CrossRef] [Google Scholar]
3. :An indirect immunofluorescence procedure for staining the same cryosection with two mouse monoclonal primary antibodies.J Histochem Cytochem.1993;41(8) :10.1177/41.8.76872661273-8 10.1177/41.8.7687266 [PubMed] [CrossRef] [Google Scholar]
Articles from NIHR Open Research are provided here courtesy of Department of Health and Social Care (UK)