# Introducción a la bioestadística

Porqué un profesional de la salud debería estudiar estadísticas? A medida que las ciencias se hacen más exactas, se vuelven matemáticas. AL pasar del caso clínico a ver cientos y miles de pacientes, comienzan a surgir patrones que requieren de un análisis para poder identificar si corresponde a una variación debida al azar o existe algo realmente que la está causando.

Rol de la bioestadistica y data science

Statement de ASA acerca de estadística

http://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/

Ver en http://psc.dss.ucdavis.edu/sommerb/sommerdemo/

[Mit Cheat Sheet](http://web.mit.edu/~csvoss/Public/usabo/stats_handout.pdf)

[Ciclismo](http://www.cyclismo.org/tutorial/R/)

[Buena introducción y visualizaciones](https://www.ocf.berkeley.edu/~jfkihlstrom/IntroductionWeb/statistics_supplement.htm)

[Ejemplo temario curso estadística](http://www.statsoft.com/Textbook)

[ver wiki vanderbilt](http://biostat.mc.vanderbilt.edu/wiki/Main/ClinStat)

[ver RCompanion](http://rcompanion.org/handbook/B_01.html)

[Libro Introducción a la Estadística OpenStax](https://cnx.org/contents/MBiUQmmY@18.114:2T34_25K@11/Introduction)

[Statistics review Crit Care](https://www.ncbi.nlm.nih.gov/pmc/?term=Statistics+review%5BTI%5D+AND+Crit+Care.)

[STATS4STEM](http://www.stats4stem.org/site)

ver [Visualizaciones](http://students.brown.edu/seeing-theory/index.html)

Ver [Review of Basic Statistical Concepts](https://onlinecourses.science.psu.edu/statprogram/review_of_basic_statistics) y [Stats 200 Penn State](https://onlinecourses.science.psu.edu/stat200/node/19)

Ejercicios en R, 1a parte y 2da parte

[Learning Statistics with R by Danielle Navarro](https://learningstatisticswithr.com/)

[Answering questions with data](https://crumplab.github.io/statistics/index.html#copying-the-textbook)

# General

Fry, C., 2009. Experimental data and design, and the role of statistics. Surgery 27, 375–380.

Kestin, I., 2018. Statistics in clinical trials and audit. Anaesthesia & Intensive Care Medicine 19, 144–148.

Thiese, M.S., Arnold, Z.C., Walker, S.D., 2015. The misuse and abuse of statistics in biomedical research. Biochem. Med. 25, 5–11.

## Good practices in data science From https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

1. Data management

- Save the raw data.
- Ensure that raw data are backed up in more than one location.
- Create the data you wish to see in the world.
- Create analysis-friendly data.
- Record all the steps used to process data.
- Anticipate the need to use multiple tables, and use a unique identifier for every record.
- Submit data to a reputable DOI-issuing repository so that others can access and cite it.

2. Software

- Place a brief explanatory comment at the start of every program.
- Decompose programs into functions.
- Be ruthless about eliminating duplication.
- Always search for well-maintained software libraries that do what you need.
- Test libraries before relying on them.
- Give functions and variables meaningful names.
- Make dependencies and requirements explicit.
- Do not comment and uncomment sections of code to control a program's behavior.
- Provide a simple example or test data set.
- Submit code to a reputable DOI-issuing repository.

3. Collaboration

- Create an overview of your project.
- Create a shared “to-do” list for the project.
- Decide on communication strategies.
- Make the license explicit.
- Make the project citable.

4. Project organization

- Put each project in its own directory, which is named after the project.
- Put text documents associated with the project in the doc directory.
- Put raw data and metadata in a data directory and files generated during cleanup and analysis in a results directory.
- Put project source code in the src directory.
- Put external scripts or compiled programs in the bin directory.
- Name all files to reflect their content or function.

5. Keeping track of changes

- Back up (almost) everything created by a human being as soon as it is created.
- Keep changes small.
- Share changes frequently.
- Create, maintain, and use a checklist for saving and sharing changes to the project.
- Store each project in a folder that is mirrored off the researcher's working machine.
- Add a file called CHANGELOG.txt to the project's docs subfolder.
- Copy the entire project whenever a significant change has been made.
- Use a version control system.

6. Manuscripts

- Write manuscripts using online tools with rich formatting, change tracking, and reference management.
- Write the manuscript in a plain text format that permits version control.

## Reglas del análisis de http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961

- Statistical Methods Should Enable Data to Answer Scientific Questions
- Signals Always Come with Noise
- Plan Ahead, Really Ahead
- Worry about Data Quality
- Statistical Analysis Is More Than a Set of Computations
- Keep it Simple
- Provide Assessments of Variability
- Check Your Assumptions
- When Possible, Replicate!
- Make Your Analysis Reproducible

## Data cleaning Pasos del data cleaning Summarize: What is in the variable? Visualize: What does the distribution look like? Check: What potential problems do we find in the variable?

ver dataMaid R Package

## Pasos del análisis

- Import the data
- Basic data exploration
- Advanced analysis/data science
- Finalize reporting.

Fuente: https://twitter.com/drob/status/748222386697179136/photo/1?ref_src=twsrc%5Etfw

ver Statistics from A to Z -- Confusing Concepts Clarified

ver StatsGuy

ver rcompanion

Términos básicos de estadística

ver https://www.camscanner.com/share/7nNnz/0/w105s11iqbodi

- numeric Numeric data (approximations of the real numbers, ℝ)
- integer Integer data (whole numbers, ℤ)
- factor Categorical data (simple classifications, like gender)
- ordered Ordinal data (ordered classifications, like educational level)
- character Character data (strings)
- raw Binary data

## datos en r para usuarios de excel

## Manejo de datos

To do data science you need be to able to solve six main types of problems:

1. __Importing__ your data into your analysis environment of choice. 2. __Tidying__ your data into a consistent form. 3. __Transforming__ it to add new variables or create summaries. 4. __Visualising__ it to help refine your questions and to reveal both the mundane and the surprising. 5. __Modelling__ to scale to larger data volumes, and handle uncertainty in principled way: 6. __Communicating__ your results to others.

### Data pipeline management

## Estadística descriptiva

Kestin, I., 2009. Statistics in medicine. Anaesthesia & Intensive Care Medicine 10, 206–213.

Kestin, I., 2006. Statistics. Anaesthesia & Intensive Care Medicine 7, 135–142.

Ver Guia para gráficos del reino unido a ### Descripción de datos mediante tabla s ### Descripción de datos mediante gráficos Tres tipos de gráficos

**Comparison** – Comparison charts are used to compare one or more datasets. They can compare items or show differences over time.

**Relationship** – Relationship charts are used to show a connection or correlation between two or more variables.

**Distribution** – Distribution charts are used to show how variables are distributed over time, helping identify outliers and trends.

Descripción de los datos mediante gráficos

Ejemplos de gráficos con errores de The Economist

#### Tipos de gráficos

##### Bar chart Bar charts are one of the most common ways to visualize data. Why? It’s quick to compare information, revealing highs and lows at a glance. Bar charts are especially effective when you have numerical data that splits nicely into different categories so you can quickly see trends within your data. When to use bar charts: • Comparing data across categories. Examples: Volume of shirts in different sizes, website traffic by origination site, percent of spending by department.

##### Lineas Line charts are right up there with bars and pies as one of the most frequently used chart types. Line charts connect individual numeric data points. The result is a simple, straightforward way to visualize a sequence of values. Their primary use is to display trends over a period of time. When to use line charts: Viewing trends in data over time. Examples: stock price change over a five-year period, website page views during a month, revenue growth by quarter.

#####Map When you have any kind of location data – whether it’s postal codes, state abbreviations, country names, or your own custom geocoding – you’ve got to see your data on a map. You wouldn’t leave home to find a new restaurant without a map (or a GPS anyway), would you? So demand the same informative view from your data. When to use maps: Showing geocoded data. Examples: Insurance claims by state, product export destinations by country, car accidents by zip code, custom sales territories.

##### Scatter plot Looking to dig a little deeper into some data, but not quite sure how – or if – different pieces of information relate? Scatter plots are an effective way to give you a sense of trends, concentrations and outliers that will direct you to where you want to focus your investigation efforts further. When to use scatter plots: Investigating the relationship between different variables. Examples: Male versus female likelihood of having lung cancer at different ages, technology early adopters’ and laggards’ purchase patterns of smart phones, shipping costs of different product categories to different regions.

##### Bubble chart Bubbles are not their own type of visualization but instead should be viewed as a technique to accentuate data on scatter plots or maps. Bubbles are not their own type of visualization but instead should be viewed as a technique to accentuate data on scatter plots or maps. People are drawn to using bubbles because the varied size of circles provides meaning about the data. When to use bubbles: Showing the concentration of data along two axes. Examples: sales concentration by product and geography, class attendance by department and time of day.

##### Histogram chart Use histograms when you want to see how your data are distributed across groups. Say, for example, that you’ve got 100 pumpkins and you want to know how many weigh 2 pounds or less, 3-5 pounds, 6-10 pounds, etc. By grouping your data into these categories then plotting them with vertical bars along an axis, you will see the distribution of your pumpkins according to weight. And, in the process, you’ve created a histogram. At times you won’t necessarily know which categorization approach makes sense for your data. You can use histograms to try different approaches to make sure you create groups that are balanced in size and relevant for your analysis. When to use histograms: Understanding the distribution of your data. Examples: Number of customers by company size, student performance on an exam, frequency of a product defect.

##### Heat maps Heat maps are a great way to compare data across two categories using color. The effect is to quickly see where the intersection of the categories is strongest and weakest. When to use heat maps: Showing the relationship between two factors. Examples: segmentation analysis of target market, product adoption across regions, sales leads by individual rep.

##### Highlight table Highlight tables take heat maps one step further. In addition to showing how data intersects by using color, highlight tables add a number on top to provide additional detail. When to use highlight tables: Providing detailed information on heat maps. Examples: the percent of a market for different segments, sales numbers by a reps in a particular region, population of cities in different years.

##### Treemap Looking to see your data at a glance and discover how the different pieces relate to the whole? Then treemaps are for you. These charts use a series of rectangles, nested within other rectangles, to show hierarchical data as a proportion to the whole. As the name of the chart suggests, think of your data as related like a tree: each branch is given a rectangle which represents how much data it comprises. Each rectangle is then sub-divided into smaller rectangles, or sub-branches, again based on its proportion to the whole. Through each rectangle’s size and color, you can often see patterns across parts of your data, such as whether a particular item is relevant, even across categories. They also make efficient use of space, allowing you to see your entire data set at once. When to use treemaps: Showing hierarchical data as a proportion of a whole: Examples: storage usage across computer machines, managing the number and priority of technical support cases, comparing fiscal budgets between years

##### Box-and-whisker Plot Box-and-whisker plots, or boxplots, are an important way to show distributions of data. The name refers to the two parts of the plot: the box, which contains the median of the data along with the 1st and 3rd quartiles (25% greater and less than the median), and the whiskers, which typically represents data within 1.5 times the Inter-quartile Range (the difference between the 1st and 3rd quartiles). The whiskers can also be used to also show the maximum and minimum points within the data. When to use box-and-whisker plots: Showing the distribution of a set of a data: Examples: understanding your data at a glance, seeing how data is skewed towards one end, identifying outliers in your data.

### Paleta de colores para gráficos

- #396AB1 - #DA7C30 - #3E9651 - #CC2529 - #535154 - #6B4C9A - #922428 - #948B3D

## Estadística inferencial ### Comparar las medias de dos o más grupos

### Comparar las proporciones de dos o más grupos

### Asociar variables

#### Terminología básica para modelado estadístico Fuente: Thorpe, K.E., 2017. How to construct regression models for observational studies (and how NOT to do it!). Can. J. Anaesth. 64, 461–470.

### Análisis de sobrevida

## Conceptos básicos

**Type I error rate (α)**: The probability of claiming that an effect exists when in fact there is no effect; usually set at 0.01 or 0.05.

**Predictor variables**: The best set of predictors needs to be chosen; the categories of each predictor need to be specified.

**Primary hypothesis**: The primary hypothesis of interest needs to be specified. GUI power programs usually provide a list of possible hypotheses after all information is specified.

**Smallest scientifically or clinically important difference**: The minimum difference in the mean values of the response variable the investigators find important.

**Variances of repeated measurements**: Variance of each of the repeated measurements needs to be specified.

**Correlations among repeated measurements**: Correlations among pairs of the repeated measurements need to be specified.- Statistical significance ver https://www.uccs.edu/lbecker/clinsig y
- Tamaño de efecto: ver https://www.leeds.ac.uk/educol/documents/00002182.htm y https://www.uccs.edu/lbecker/effect-size

## Diferencia clínica y estadística

# Distribuciones de probabilidad

Distribuciones de probabilidad

Ver https://www.mathsisfun.com/data/index.html

# Preparación de los datos para el análisis

Preparación de los datos para el análisis

# 10 reglas para un análisis estadístico eficiente

10 reglas para un análisis estadístico efectivo

## Cuatro tipos de diseño experimental (Gotelli) Variables clasificadas por Continua ~ Categórica y Dependiente ~ Independiente, y según eso:

Independiente | |||

Continua | Categórica | ||

Dependiente | Continua | Regresión | ANOVA |
---|---|---|---|

Categórica | Reg Log | Tabular |

# Protocolo para la exploración de datos antes del análisis

Protocolo para la exploración de datos antes del análisis

**Reproducibility** – the ability to recompute results – and **replicability** – the chances other experimenters will achieve a consistent result – are two foundational characteristics of successful scientific research.

# Inferencia estadística

## Cálculo del tamaño muestral y potencia estadística

Sample Size Planning for Statistical Power and Accuracy in Parameter Estimation https://www.annualreviews.org/doi/10.1146/annurev.psych.59.103006.093735

## Inferencia

Reporte de estudios que muestran asociación e inferencia causal Lederer et al., 2019. Control of Confounding and Reporting of Results in Causal Inference Studies. Guidance for Authors from Editors of Respiratory, Sleep, and Critical Care Journals. Ann. Am. Thorac. Soc. 16, 22–28. https://paperpile.com/app/p/55287432-b796-0b7c-b09d-56b81abc0f85

## Cuál análisis estadístico utilizar Ver https://www.maximaformacion.es/master-estadistica-aplicada-con-r/blog/item/guia-para-encontrar-tu-prueba-estadistica.html

## Errores usuales del análisis e interpretación de los resultados del test estadístico

Problem | Reason | Solution |

1. Assuming small differences are meaningful | Most small differences are due to chance, not meaningful differences | Ask for the margin of error (ie. half of the 95% CI): if the difference observed is smaller than the margin of error, the difference is probably due to random fluctuations in the data. |

2. Equating statistical significance with real world significance | Generalisations about between-group differences often ignore within-group variability or between-group similarities. | Ask for the effect size of the difference between groups and its precision (eg. mean difference and 95% CI) |

3. Neglecting to look at extremes | For Normally distributed data, a small change in the group average accentuates differences at the extremes of the distribution more than differences within most of the bell shape. | When you’re dealing with group averages (which is most of the time), small group differences don’t matter much. |

4. Trusting coincidence | You can nearly always find an interesting pattern or correlation if you massage the data hard enough. The authors cite a correlation between no. of drownings from falling into pools and films Nicholas Cage appeared in. | Ask how reliable the observed association is: has this association only happened once or has it happened before? Can future associations be predicted? |

5. Getting causation backward | Correlation does not equal causation: Did drownings in pools cause Nicholas Cage to appear in more films, or did Nicholas Cage appearing in more films cause more drownings? | Remember to think about reverse causality when you see an association. |

6. Forgetting to consider outside causes | Known as confounding: eg. people who eat at restaurants appear to be healthier, but they are often healthier because they are richer and can afford better health care. | Remember to think about potential outside causes: when investigating a certain cause, think about what, in turn, causes that cause. |

7. Deceptive graphs | (This is a biggie.) Eg. vertical axis scaling can accentuate differences between groups even though differences are small. | Check graph labels along axes. Be very skeptical about unlabeled graphs. |

Louis and Chapman (2017) The seven deadly sins of statistical misinterpretation, and how to avoid them. The Conversation.

# Reporte de análisis estadístico

Reporte de análisis estadístico en revistas

## Guía para el reporte estadístico Ver Nieminen, P., 2020. Ten Points for High-Quality Statistical Reporting and Data Presentation. NATO Adv. Sci. Inst. Ser. E Appl. Sci. 10, 3885.

## Reporte para comparación de grupos

## Reporte para analísis de asociación

## Reporte de análisis de sobrevida 1. Sample size: We evaluated the methods that the investigators described for sample size calculation. In addition, in the studies that used multiple regression analysis we evaluated the number of the events and number of covariates in order to estimate the adequacy of power. According to Peduzzi et al., approximately ten events per covariate is appropriate in PH regression analysis [8].

2.Censoring description: We evaluated the description of censoring and whether the investigators reported this adequately, inadequately or there was no mention.

3.Survival curves: We evaluated the statistical methods used for generating survival curves. For the comparison of survival between the groups, we documented the reported methods (log-rank test or Wilcoxon test). We also noted the shape of the survival curves (evenly separated or crossing survival curves).

4. Statistical significance: The statistical test used to evaluate the difference between two survival curves is determined using a log-rank test or weighted log-rank test (e.g. Wilcoxon test). The null hypothesis of the test is that there is no difference between the two survival curves. We documented the statistical test reported in the articles.

5. Regression model: The statistical methods used for survival regression analysis were evaluated. We were interested in the regression model that the investigators used for calculating the hazard ratio (e.g. Cox PH model, extended Cox PH model or parametric survival model). We were also interested in the other regression models (time-dependent variable, competing risk analysis or repeated event analysis). In addition, the test for interaction of the variable was checked. For the studies that used multivariate regression, we assessed the description of variable selection and the strategy used for model building.

6. Check for the PH assumption: We assessed the test for PH assumption described in the articles. The assessment included methods used for checking PH assumption (graphical approach or the goodness-of-fit testing approach).

7. Model checking: We evaluated whether the investigators assessed for goodness-of-fit measures. The residual-based diagnostics were also assessed (martingale residuals, Cox-Snell residuals, Schoenfeld residuals or deviance residuals). De Chai-Adisaksopha, C., Iorio, A., Hillis, C., Lim, W., Crowther, M., 2016. A systematic review of using and reporting survival analyses in acute lymphoblastic leukemia literature. BMC Hematol 16, 17.

Zhu, X., Zhou, X., Zhang, Y., Sun, X., Liu, H., Zhang, Y., 2017. Reporting and methodological quality of survival analysis in articles published in Chinese oncology journals. Medicine 96, e9204.

## Errores a evitar

Ver en http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist

Ver http://slides.com/maartenzam/dh18-stats

### Design and Sample Size Problems

** ALERT!Use of an improper effect size**

If a study is designed to detect a certain effect size with a given power, the effect size should never be the observed effect from another study, which may be estimated with error and be overly optimistic. The effect size to use in planning should be the clinically or biologically relevant effect one would regret missing. Usually the only information from prior studies that is useful in sample size estimation are (in the case of a continuous response variable with a symmetric distribution) estimates of the standard deviation or the correlation between two measurements on the same subject measured at two different times, or (in the case of a binary or time to event outcome) event probabilities in control subjects.

** ALERT!Relying on standardized effect sizes**

Many researchers use Cohen's standardized effect sizes in planning a study. This has the advantage of not requiring pilot data. But such effect sizes are not biologically meaningful and may hide important issues as discussed by Lenth. Studies should be designed on the basis of effects that are relevant to the investigator and human subjects. If, for example, one plans a study to detect a one standard deviation (SD) difference in the means and the SD is large, one can easily miss a biologically important difference that happened to be much less than one SD in magnitude. Note that the SD is a measure of how subjects disagree with one another, not a measure of an effect (e.g., the shift in the mean).

### General Statistical Problems

** ALERT!Inefficient use of continuous variables**

Categorizing continuous predictor or response variables into intervals, as detailed here, causes serious statistical inference problems including bias, loss of power, and inflation of type I error.

** ALERT!Relying on assessment of normality of the data**

Some analysts use tests or graphics for assessing normality in choosing between parametric and nonparametric tests. This is often the result of an unfounded belief that nonparametric rank tests are not as powerful as parametric tests. In fact on the average nonparametric tests are more powerful than their parametric counterparts, because data are non-normally distributed more often than they are Gaussian. At any rate, using an assessment of normality to choose a test relies on the assessment having nearly perfect sensitivity. If a test of normality has a large type II error, there is a high probability of choosing the wrong approach. Coupling a test of normality to a final nonparametric vs. parametric test only has the appearance of increasing power. If the normality test has a power of 1.0 one can, for example, improve on the 0.96 efficiency of the Wilcoxon test vs. the t -test when normality holds. However, once the uncertainty of the normality test is accounted for, there is no power gain.

** ALERT!Inappropriate use of parametric tests**

When all that is desired is an unadjusted (for other variables) P -value and a parametric test is used, the resulting inference will not be robust to extreme values, will depend on how the response variable is transformed, and will suffer a loss of power if the data are not normally distributed. Parametric methods are more necessary when adjusting for confounding or for subject heterogeneity, or when dealing with a serially measured response variable. When one wants a unitless index of the strength of association between two continuous variables and only wants to assume that the true association is monotonic (is always decreasing or always increasing), the nonparametric Spearman's rho rank correlation coefficient is a good choice. A good nonparametric approach to getting confidence intervals for means and differences in means is the bootstrap. Recommended Software

** ALERT!Inappropriate descriptive statistics**

The mean and standard deviation are not descriptive of variables that have an asymmetric distribution such as variables with a heavy right tail (which includes many clinical lab measurements). Quantiles are always descriptive of continuous variables no matter what the distribution. A good 3-number summary is the lower quartile (25th percentile), median, and upper quartile (75th percentile). The difference in the outer quartiles is a measure of subject-to-subject variability (it is an interval containing half the subjects' values). The median is always descriptive of “typical” subjects. By comparing the difference between the upper quartile and the median with the difference between the median and the lower quartile, one obtains a sense of the symmetry of the distribution. Above all don't provide descriptive statistics such as “the mean hospital cost was $10,000 plus or minus $20,000.” Nonparametric bootstrap confidence intervals will prevent impossible values being used as confidence limits. When using the Wilcoxon-Mann-Whitney test for comparing two continuous or ordinal variables, use difference estimates that are consistent with this test. The Wilcoxon test does not test for difference in medians or means. It tests whether the Hodges-Lehmann estimate of the difference between two groups is zero. The HL-estimate is the median difference over all possible pairs of subjects, the first from group 1 and the second from group 2. See http://en.wikipedia.org/wiki/Mann-Whitney_U for an example statement of results, and a good reference for this. Recommended Software Note : When there are excessives ties in the response variable such as a variable with clumping at zero, quantiles such as the median may not be good descriptive statistics (the mean with associated bootstrap confidence limits may be better), and the Hodges-Lehmann estimate does not work well. Although seen as appealing by some, the so-called “number needed to treat” suffers from a long list of problems and is not recommended.

** ALERT!Failure to include confidence intervals**

Confidence intervals for key effects should always be included. Studies should be designed to provide sufficiently narrow confidence intervals so that the results contain use information. See sections 3.4.4, 3.5.3, 3.7.4 of this and The End of Statistical Significance?

** ALERT!Inappropriate choice of measure of change**

In a single-subject-group study in which there are paired comparisons (e.g., pre vs. post measurements), researchers too easily take for granted the appropriate measure of change (simple difference, percent change, ratio, difference of square roots, etc.). It is important to choose a change measure that can be taken out of context, i.e., is independent of baseline. See MeasureChange, STBRsylConcepts, and STBRsylEffectEvidence for more information. In general, change scores cause more problems than they solve. For example, one cannot use summary statistics on percent changes because of improper cancellation of positive and negative changes.

** ALERT!Use of change scores in parallel-group designs**

When there is more than one subject group, for example a two-treatment parallel-group randomized controlled trial, it is very problematic to incorporate change scores into the analysis. First, it may be difficult to choose the appropriate change score as mentioned above (e.g., relative vs. absolute). Second, regression to the mean and measurement error often render simple change scores inappropriate. Third, when there is a baseline version of the final respose variable, it is necessary to control for subject heterogeneity by including that baseline variable as a covariate in an analysis of covariance. Then the baseline variable needs to appear in both the left hand and right hand side of the regression model, making interpretation of the results more difficult. It is much preferred to keep the pure response (follow-up) variable on the left side and the baseline value on the right side of the equation.

** ALERT!Inappropriate analysis of serial data (repeated measures)**

Some researchers analyze serial responses, when multiple response measurements are made per subject, as if they were from separate subjects. This will exaggerate the real sample size and make P -values too small. Some researchers still use repeated measures ANOVA even though this technique makes assumptions that are extremely unreasonable for serial data. An appropriate methodology should be used, such as generalized least squares or mixed effects models with an appropriate covariance structure, GEE, or an approach related to GEE in which the cluster bootstrap or the cluster sandwich covariance estimator is used to correct a working independence model for within-subject correlation.

** ALERT!Making conclusions from large P -values | More Information | Absence of Evidence is not Evidence of Absence**

In general, the only way that a large P -value can be interpreted is for example “The study did not provide sufficient evidence for an effect.” One cannot say “ P = 0.7, therefore we conclude the drug has no effect”. Only when the corresponding confidence interval excludes both clinically significant benefit and harm can one make such a conclusion. A large P -value by itself merely means that a higher sample size is required to allow conclusions to be drawn. See section 3.8 of this for details, along with this.

** ALERT!Filtering**

There are many ways that authors have been seduced into taking results out of context, particularly when reporting the one favorable result out of dozens of attempted analyses. Filtering out (failing to report) the other analyses is scientifically suspect. At the very least, an investigator should disclose that the reported analyses involved filtering of some kind, and she should provide details. The context should be reported (e.g., “Although this study is part of a planned one-year follow-up of gastric safety for Cox-2 inhibitors, here we only report the more favorable short term effects of the drug on gastric side effects.”). To preserve type I error, filtering should be formally accounted for, which places the burden on the investigator of undertaking often complex Monte Carlo simulations. Here is a checklist of various ways of filtering results, all of which should be documented, and in many cases, re-thought:

- Subsets of enrolled subjects
- Selection of endpoint
- Subset of follow-up interval
- Selection of treatments
- Selection of predictors
- Selection of cutpoints for continuous variables

There must be a complete accounting of all subjects or animals who entered the study. Response rates to follow-up assessments must be quoted, and interpretation of study results will be very questionable if more than perhaps 10% of subjects do not have response data available unless this is by design. In a randomized comparison of treatments, the “intent to treat” analysis should be emphasized.

** ALERT!Missing Data**

It is not appropriate to merely exclude subjects having incomplete data from the analysis. No matter how missing data are handled, the amount of missing baseline or response data should be be carefully documented, including the proportion of missing values for each variable being analyzed and a description of the types of subjects having missing variables. The latter may involve an exploratory analysis predicting the tendency for a variable to have a missing value, based on predictors that are usually not missing. When there is a significant proportion of subjects having incomplete records, multiple imputation is advisable. Adding a new category to a variable to indicate missingness renders interpretation impossible and causes serious biases. See the October 2006 issue of Journal of Clinical Epidemiology for more about this and for other useful papers about missing data imputation. A commonly used approach to handling dropouts in clinical trials is to use the “last observation carried forward” method. This method has been proven to be completely inappropriate in all situations. One of several problems with this method is that it treats imputed (carried forward) values as if they were real measurements. This results in overconfidence in estimates of treatment effects (standard errors and P -values are too low and confidence intervals are too narrow).

** ALERT!Multiple Comparison Problems**

When using traditional frequentist statistical methods, adjustment of P -values for multiple comparisons is necessary unless The investigator has pre-specified an ordered priority list of hypotheses, and The results of all hypothesis tests performed are reported in the pre-specified order, whether the P - values are low or high P -values should be adjusted for filtering as well as for tests that are reported in the current paper.

### Multivariable Modeling Problems

** ALERT!Inappropriate linearity assumptions**

In general there is no reason to assume that the relationship between a predictor and the response is linear. However, categorizing a continuous predictor can cause major problems. Good solutions are to use regression splines or nonparametric regression.

** ALERT!Inappropriate model specification**

Researchers frequently formulate their first model in such a way that it encapsulates a model specification bias that affects all later analytical steps. For example, the March 2009 issue of J Clin Epi has a paper examining the relationship between change in cholesterol and mortality. The authors never questioned whether the mortality effect was fully captured by a simple change, i.e., is the prediction equation of the form f(post)-f(pre) where f is the identify function? It is often the case that the effect of a simple difference depends on the “pre” value, that the transformation f is not linear, or that there is an interaction between pre and post. All of these effects are contained in a flexible smooth nonlinear regression spline surface (tensor spline) in three dimensions where the predictors are pre and post. One can use the surface to test the adequacy of its special case (post-pre) and to visualize all patterns.

** ALERT!Use of stepwise variable selection**

Stepwise variable selection, univariable screening, and any method that eliminates “insignificant” predictor variables from the final model causes a multitude of serious problems related to bias, significance, improper confidence intervals, and multiple comparisons. Stepwise variable selection should be avoided unless backwards elimination is used with an alpha level of 0.5 or greater. See also here.

** ALERT!Lack of insignificant variables in the final model**

Unless the sample size is huge, this is usually the result of the authors using a stepwise variable selection or some other approach for filtering out “insignificant” variables. Hence the presence of a table of variables in which every variable is significant is usually the sign of a serious problem. Authors frequently use strategies involving removing insignificant terms from the model without making an attempt to derive valid confidence intervals or P-values that account for uncertainty in which terms were selected (using for example the bootstrap or penalized maximum likelihood estimation). J Clin Epi 2009-03-01, Volume 62, Issue 3, Pages 232-240 cited Ockham's razor as a principle to be followed when building a model, not realizing that parsimony resulting from utilizing of the data at hand to make modeling decisions only seems to result in parsimony. Removing insignificant terms causes bias, inaccurate (too narrow) confidence intervals, and failure to preserve type I error in the resulting model's P-values, which are calculated as though the model was completely pre-specified.

** ALERT!Overfitting and lack of model validation**

When a multivariable model is reported, an unbiased validation (at least an internal validation) should be reported in the paper unless The model terms were pre-specified and The purpose of model fitting was not to report on the predictive accuracy of the model but to compute pre-specified partial test statistics, estimates, and confidence intervals for a small selected set of predictors or The dataset meets the “20:1” rule The 20:1 rule is as follows. Let m denote the effective sample size (the number of subjects if the response variable is a fully-observed continuous one; the number of events if doing a survival analysis; the lower of the number of events and number of non-events if the response is dichotomous) and p denote the number of candidate predictor terms that were examined in any way with respect to the response variable. p includes nonlinear terms, product terms, different transformations attempted, the total number of cutoffs attempted to be applied to continuous predictors, and the number of variables dropped from the final model in a way that was unblinded to the response. If the ratio of m to p exceeds 20, the model is likely to be reliable and there is less need for the model to be validated. When a validation is needed, the best approach is typically the bootstrap. This is a Monte Carlo simulation technique in which all steps of the model-building process (if the model was not pre-specified) are repeated for each of, say, 150 samples with replacement of size n from the original sample containing n subjects. Failure to validate predictive accuracy with full resolution

When a predictive model or instrument is intended to provide absolute estimates (e.g., risk or time to event), it is necessary to validate the absolute accuracy of the instrument over the entire range of predictions that are supported by the data. It is not appropriate to use binning (categorization) when estimating the calibration curve. Instead, the calibration curve should be estimated using a method that smoothly (without assuming linearity) relates predicted values (formed from a training set) to observed values (in an independent test or overfitting-corrected using resampling; see Stat in Med 15:361;1996). For testing whether the calibration curve is ideal (i.e., is the 45 degree line of identity) consider using the single d.f. Spiegelhalter z-test (Stat in Med 5:421;1986). The mean absolute error and the 90th percentile of absolute calibration error are useful summary statistics. All of these quantities and tests are provided by the R rms package.

** ALERT!Use of Imprecise Language | Glossary**

It is important to distinguish rates from probabilities, odds ratios from risk ratios, and various other terms. The word risk usually means the same thing as probability. Here are some common mistakes seen in manuscripts:

- risk ratio or RR used in place of odds ratio when an odds ratio was computed
- reduction in risk used in place of reduction in odds; for example an odds ratio of 0.8 could be referred to as a 20% reduction in the odds of an event, but not as a 20% reduction in risk
- risk ratio used in place of hazard ratio when a Cox proportional hazards model is used; the proper term hazard ratio should be used to describe ratios arising from the Cox model. These are ratios of instantaneous event rates (hazard rates) and not ratios of probabilities.
- multivariate model used in place of multivariable model; when there is a single response (dependent) variable, the model is univariate. Multivariate is reserved to refer to a model that simultaneously deals with multiple response variables.

** ALERT!Graphics | Handouts | Advice from the PGF manual (chapter 6)**

- Pie charts are visual disasters
- Bar charts with error bars are often used by researchers to hide the raw data and thus are often unscientific; for continuous response variables that are skewed or have for example fewer than 15 observations per category, the raw data should almost always be shown in a research paper.
- Dot charts are far better than bar charts, because they allow more categories, category names are instantly readable, and error bars can be two-sided without causing an optical illusion that distorts the perception of the length of a bar
- Directly label categories and lines when possible, to allow the reader to avoid having to read a symbol legend
- Multi-panel charts (dot charts, line graphs, scatterplots, box plots, CDFs, histograms, etc.) have been shown to be easier to interpret than having multiple symbols, colors, hatching, etc., within one panel
- Displays that keep continuous variables continuous are preferred

** ALERT!Tables | Examples (see section 4.2)**

As stated in Northridge et al (see below), “The text explains the data, while tables display the data. That is, text pertaining to the table reports the main results and points out patterns and anomalies, but avoids replicating the detail of the display.” In many cases, it is best to replace tables with graphics.

### Ways Medical Journals Could Improve Statistical Reporting

- Require that the Methods section includes a detailed and reproducible description of the statistical methods.
- Require that the Methods section includes a description of the statistical software used for the analysis and sample size calculations.
- Require authors to submit a diskette with their data files as a spreadsheet or statistical software file when submitting manuscripts for publication.
- Pay an experienced biostatistician to review every manuscript.
- Require exact P values, reported consistently to 3 decimal places, rather than NS or P<0.05, unless P<0.001 or space does not permit exact P Values - as in a complex table or Figure.
- Require that the Methods section contains enough detail about how the sample size was calculated so that another statistician could read the report and reproduce the calculations.
- Do not allow ambiguous reporting of percentages, such as “The recurrence rate in the control group was 50% and we calculated that the sample size required to detect a 20% reduction would be 93 in each group.” Some authors mean 30% (50%-20%=30%) and some mean 40% (20% of 50% is 10%, 50%-10%=40%). Require that the authors clarify this.
- Print the Methods section in a font the same size as the rest of the paper.
- Require 95% confidence interval for all important results, especially those supporting the conclusions. Require authors to justify the logic of using standard errors.
- Identify every statistical test used for every P value. In tables, this can be accomplished with footnotes and in figures the legend can describe the test used.
- Enforce some consistency of statistical reporting. Do not allow authors to invent names for statistical methods.
- Require that the authors describe who performed the statistical analysis. This is especially important if the analyses were performed by the biostatistics section of a pharmaceutical company.

# Dudas frecuentes

Dudas y consultas frecuentes de estadística

# Cursos recomendados

Cursos recomendados de bioestadistica y R

# Data science terms

**Big data**
Big data is a term for collections of data that are so large they can’t be processed through traditional data processing systems. These collections come from sources like mobile devices, emails, search keywords, user database information, applications, and servers. By finding ways to comb through this data, companies can identify consumer patterns and use them to predict and optimize their business.

**Data architecture**
How data is collected, stored, accessed, and used in companies and organizations

**Database**
Data architecture describes the way data is collected, stored, accessed, and used in companies and organizations. It can be seen as the roadmap for how data flows across an organization’s IT systems and applications.

**Data modeling**
Determining what kind of data is needed and how it will be structured and organized

**Data visualization**
Data visualization is the use of graphs, charts, tables, infographics, etc. in order to define and communicate data being analyzed and the the findings that have come from it.

**Relational database management system (RDMS)**
Relational database management systems are used to organize data into tables—the data can then be accessed or reassembled without having to reorganize the database tables. Examples of RDMS include SAP and MySQL.