The various epidemics that health systems periodically suffers require having valid anddetailed information on its evolution and predictions in the short, medium and long termin real time to allow the health system to organize itself in advance to be able toaddress the health and sanitary problem that this entails.The objectives of this proposalare: to study the usefulness of the health system's information and data storage systemas a source for quickly and efficiently obtaining data necessary for modeling anepidemiological outbreak; its modeling in order to predict its evolution and thepresentation of results to help in decision making. The investigatorswill rely on theexperience obtained so far during the Severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) pandemic, to define semi-automatic and flexible criteria for searching,extracting, cleaning and aggregating data. Predictions of incidence, number of hospitaland ICU admissions, and number of deaths will be made at the Basque Country level.Withinthe analysis of temporal data, especially in the context of the pandemic, it is essentialto have robust tools that allow accurate predictions. In this study, the investigatorsemployed P-splines based on the negative binomial distribution to predictpandemic-related positive cases, hospital admissions, and ICU admissions.
Design. Retrospective observational study. The modeling will be based on the SARS-CoV-2
pandemic that started at the beginning of 2020.
Subjects of the study. Information will be collected on daily incidence data aggregated
by age and sex for: tests performed, positive cases, hospital admissions and ICU
admissions for SARS-CoV-2, hospital discharges and ICU discharges, recovered and
mortality (in ICU, in hospital or in the community) of individuals with Coronavirus
Disease of 2019 (COVID 19).
Criteria for inclusion. Of positive cases: Having a SARS-CoV-2 infection
laboratory-confirmed by a positive result on the reverse transcriptase-polymerase chain
reaction assay for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) or a
positive antigen test from March 1, 2020 to January 9, 2022.
For hospital admissions: Hospital admissions since the start of the pandemic. Considering
different episodes as a single admission when it comes to transfers from one center to
another. Consider exclusively income due to the COVID19.
Exclusion criteria: Patients admitted for other reasons who have developed the disease
during their hospital stay.
Variables. The data to be collected is aggregated data in the form of incidents. The
population will be stratified into ten age groups (0 - 9, 10 - 19, ..., 70 - 79, 80 - 89,
90+) and by sex. Variables:
- Individuals in the study population by age.
- Number of new confirmed positive cases of COVID19 by age and day.
  -  Number of new hospital admissions due to COVID19 by age and day. Number of ICU
     admissions due to COVID19
- Number of total deaths from COVID19 by age and day.
  -  Number of hospital discharges (live patients) of patients who have been hospitalized
     for COVID19 by age and day (excluding transfers).
- Number of deaths in hospital due to COVID19 by age and day.
- Number of deaths in the ICU due to COVID19 by age and day.
The outcome variables that will be obtained from the proposed modeling are:
- Number of estimated positive COVID19 cases by age and day.
- Number of estimated COVID19 hospital admissions by age and day.
- Number of estimated total deaths due to COVID19 estimated by age and day.
- Number of estimated ICU admissions due to COVID19 estimated by age and day.
Analysis of data. The investigators will use P-splines and Negative Binomial
Distribution. P-splines, or penalized splines, are a powerful tool for modeling nonlinear
relationships in temporal data. By combining them with the negative binomial
distribution, a model is obtained that is especially suitable for counting data with
over-dispersion, as is the case with pandemic data.
Procedure:
  -  Data Collection: Daily data on positive cases, hospital admissions and ICU
     admissions will be obtained from the beginning of the pandemic until september 2022.
  -  Modeling: A P-splines model based on the negative binomial distribution will be
     fitted to the data. This model will be designed to capture temporal trends and
     seasonal patterns, as well as to handle the over-dispersion present in the data.
  -  Model with Random Effect for Day of the Week: Specifically for the prediction of
     hospital admissions, a random effect for the day of the week will be incorporated.
     This adjustment will be made because a systematic variability in income was
     identified depending on the day of the week. Incorporating this random effect
     significantly will improve the accuracy of the model for this variable.
  -  Prediction: Predictions will be made for two time horizons: short term (1 and 2
     days) and medium term (5 days). These predictions will allow us to anticipate the
     evolution of the pandemic and make informed decisions.
Validation of Predictions: To validate the accuracy and robustness of the predictions, a
retrospective analysis will be carried out at different times (or waves) of the pandemic.
Model predictions will be compared to actual observed data, and error metrics will be
calculated to evaluate model performance.
Limitations. One of the limitations of the study is the possible loss of hospitalizations
due to the disease considered and death (or recovery) in individuals whose temporal
sequence of testing, admission and death (or recovery) has not followed the sequence used
in searches carried out.
Ethical aspects. This study uses only anonymized information to meet its objectives.
There is no data available to identify a patient.
The processing, communication and transfer of personal data of all participating persons
complies with the provisions of the European Data Protection Regulation (EU2016/679)
regarding the protection of natural persons with regard to processing. of personal data
and the free circulation of these data and Organic Law 3/2018, of December 5, on the
Protection of Personal Data and guarantee of digital rights. Virtually all of the data
necessary for this study is aggregated data that in no case can be associated with
individuals. All information will be treated absolutely confidentially.
Regarding obtaining informed consent from the patient, this research team proposes
carrying out the study without asking the patient for informed consent. The reasons why
this proposal is made are based on article 58 of Law 14/2007, of July 3, on Biomedical
Research (""..exceptionally, coded or identified samples may be treated for the purposes
of biomedical research without the consent of the source subject, when obtaining said
consent is not possible or represents an unreasonable effort. In these cases, the
favorable opinion of the corresponding Research Ethics Committee will be required. ")
Inclusion Criteria:
  -  To be a positive SARS-CoV-2 infection laboratory-confirmed by a positive result on
     the reverse transcriptase-polymerase chain reaction assay for severe acute
     respiratory syndrome coronavirus 2 (SARS-CoV-2) or a positive antigen test from
     March 1, 2020 to January 9, 2022
For hospital admissions:
  -  Consider different episodes as a single admission when it comes to transfers from
     one center to another.
- Exclusively admissions due to COVID-19.
Exclusion Criteria:
• Patients admitted for other reasons who have developed the disease during their
hospital stay.
Hospital Galdakao-Usansolo
Galdakao	3121751, Bizkaia, Spain