| Title: | Perform the Pearson-Quetelet Analysis on Two-Way Contingency Tables |
|---|---|
| Description: | Tools to perform Pearson-Quetelet analysis on two-way contingency tables. The package computes absolute and relative frequencies, Quetelet indices, Pearson-Quetelet decomposition, apex tables, and chi-square summaries for interpreting associations between categorical variables. |
| Authors: | Boris Mirkin [aut], Luca Coraggio [aut, cre], Trevor Fenner [aut], Zina Taran [aut] |
| Maintainer: | Luca Coraggio <[email protected]> |
| License: | LGPL-3 |
| Version: | 1.0.0 |
| Built: | 2026-05-20 10:24:00 UTC |
| Source: | https://github.com/cran/PQA |
Cross-classified counts of participants by BMI category at study entry and all-cause mortality outcome from the Leisure World Cohort Study (1981-2004), excluding the first 5 years of follow-up (Table 6 in the cited paper).
bmi_mortalitybmi_mortality
A numeric matrix (also an array) with 4 rows and 2 columns:
Underweight, Normal, Overweight, Obese.
Died, Survived (Participants - Deaths).
Frequencies (counts).
BMI thresholds at study entry:
Underweight: BMI < 18.5
Normal weight: BMI 18.5-24.9
Overweight: BMI 25.0-29.9
Obese: BMI >= 30
Category-level metadata (excluding first 5 years):
Underweight: median BMI 17.8, range 12.4-18.5, participants 390, deaths 352
Normal: median BMI 22.4, range 18.5-25.0, participants 7611, deaths 6091
Overweight: median BMI 26.5, range 25.0-29.9, participants 2937, deaths 2345
Obese: median BMI 31.6, range 30.0-54.1, participants 437, deaths 339
Totals in this dataset: 11,375 participants and 9,127 deaths.
Corrada, Maria M., Kawas, Claudia H., Mozaffar, Farah, and Paganini-Hill, Annlia (2006). Association of Body Mass Index and Weight Change with All-Cause Mortality in the Elderly. American Journal of Epidemiology, 163(10), 938-949. Table 6, values excluding the first 5 years of follow-up. doi:10.1093/aje/kwj114
data(bmi_mortality) bmi_mortality rowSums(bmi_mortality) colSums(bmi_mortality) sum(bmi_mortality)data(bmi_mortality) bmi_mortality rowSums(bmi_mortality) colSums(bmi_mortality) sum(bmi_mortality)
Cross-classified counts of participants by BMI category at study entry and all-cause mortality outcome from the Leisure World Cohort Study (1981-2004), using the total sample values reported in Table 6 of the cited paper.
bmi_mortality_allbmi_mortality_all
A numeric matrix (also an array) with 4 rows and 2 columns:
Underweight, Normal, Overweight, Obese.
Died, Survived (Participants - Deaths).
Frequencies (counts).
BMI thresholds at study entry:
Underweight: BMI < 18.5
Normal weight: BMI 18.5-24.9
Overweight: BMI 25.0-29.9
Obese: BMI >= 30
Category-level metadata (total sample):
Underweight: median BMI 17.8, range 12.4-18.5, participants 556, deaths 518
Normal: median BMI 22.4, range 18.5-25.0, participants 9021, deaths 7501
Overweight: median BMI 26.5, range 25.0-29.9, participants 3376, deaths 2784
Obese: median BMI 31.6, range 30.0-54.1, participants 498, deaths 400
Totals in this dataset: 13,451 participants and 11,203 deaths.
Corrada, Maria M., Kawas, Claudia H., Mozaffar, Farah, and Paganini-Hill, Annlia (2006). Association of Body Mass Index and Weight Change with All-Cause Mortality in the Elderly. American Journal of Epidemiology, 163(10), 938-949. Table 6, total sample values. doi:10.1093/aje/kwj114
data(bmi_mortality_all) bmi_mortality_all rowSums(bmi_mortality_all) colSums(bmi_mortality_all) sum(bmi_mortality_all)data(bmi_mortality_all) bmi_mortality_all rowSums(bmi_mortality_all) colSums(bmi_mortality_all) sum(bmi_mortality_all)
Performs Pearson-Quetelet analysis (PQA) to examine associations between categorical variables through the Quetelet index and its decomposition of the chi-square statistic.
pqa(x)pqa(x)
x |
A two-way |
The Quetelet index is computed as , so 0 indicates
independence, positive values indicate higher-than-expected frequency, and negative values
indicate lower-than-expected frequency. The decomposition pq equals
and sums to ; apex rescales pq to percentage contributions.
When (perfect independence), apex is returned as a zero table.
The function automatically handles missing factor/level names and assesses chi-square validity based on expected frequencies:
flag = 0: Valid.
flag = 1: Unreliable (min. expected frequency < 5).
flag = 2: Cannot be computed (min. expected frequency < 1 or df = 0).
An object of class pqa, which is a list containing:
absAbsolute frequencies (counts).
relRelative frequencies (proportions).
qQuetelet index values, measuring relative change in probability.
pqPearson-Quetelet decomposition of the chi-square statistic.
apexPercentage contributions of each cell to the chi-square statistic.
chisqA list of class pqa.chisq with test results (stat, df, pval) and a validity flag.
table for creating contingency tables,
chisq.test for chi-square tests
# Example 1: Using the built-in usa_voting_prefs dataset data(usa_voting_prefs) result <- pqa(usa_voting_prefs) print(result$abs) # View absolute frequencies print(result$chisq) # View chi-square test results # Example 2: Using a matrix (converted to table first) data_matrix <- matrix(c(10, 20, 15, 25), nrow = 2, ncol = 2) dimnames(data_matrix) <- list(Gender = c("Male", "Female"), Preference = c("A", "B")) result <- pqa(as.table(data_matrix))# Example 1: Using the built-in usa_voting_prefs dataset data(usa_voting_prefs) result <- pqa(usa_voting_prefs) print(result$abs) # View absolute frequencies print(result$chisq) # View chi-square test results # Example 2: Using a matrix (converted to table first) data_matrix <- matrix(c(10, 20, 15, 25), nrow = 2, ncol = 2) dimnames(data_matrix) <- list(Gender = c("Male", "Female"), Preference = c("A", "B")) result <- pqa(as.table(data_matrix))
Displays a summary of the available components within a pqa object.
## S3 method for class 'pqa' print(x, pp = NULL, ...)## S3 method for class 'pqa' print(x, pp = NULL, ...)
x |
A |
pp |
Logical; if |
... |
Further arguments passed to or from other methods. |
Components include absolute (abs) and relative (rel) frequencies,
Quetelet indices (q), Pearson-Quetelet decomposition (pq),
apex (apex), and chi-square results (chisq).
Invisibly returns the input object.
pqa, summary.pqa, print.pqa.subtable
data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) print(qt)data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) print(qt)
Formatted print method for pqa.chisq objects, showing test statistics
and validity assessments.
## S3 method for class 'pqa.chisq' print(x, pp = NULL, ...)## S3 method for class 'pqa.chisq' print(x, pp = NULL, ...)
x |
A |
pp |
Logical; if |
... |
Further arguments passed to or from other methods. |
Displays the null hypothesis, chi-square statistic, degrees of freedom, and p-value. Includes warnings if the test is unreliable (expected frequencies < 5) or cannot be computed.
Invisibly returns the input object.
data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) print(qt$chisq)data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) print(qt$chisq)
Formatted print method for pqa.subtable components such as absolute
frequencies, relative frequencies, Quetelet indices, decompositions, and
apex.
## S3 method for class 'pqa.subtable' print(x, pp = NULL, ...)## S3 method for class 'pqa.subtable' print(x, pp = NULL, ...)
x |
A |
pp |
Logical; if |
... |
Further arguments passed to or from other methods. |
Formatting (rounding, scaling, and marginals) automatically adapts to the subtable type:
abs, rel, pq: shown with 4 decimal places and marginals.
q: shown as percentages with 2 decimal places and no marginals.
apex: shown as percentages with 2 decimal places and marginals.
If pp = FALSE, the raw matrix-like object is printed via
print.AsIs().
Invisibly returns the input object.
pqa, print.pqa, print.pqa.chisq
data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) print(qt$abs) print(qt$q)data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) print(qt$abs) print(qt$q)
Prints a textual summary of a pqa object, including absolute frequencies,
chi-square test output, Quetelet signals of association/indifference, and
apex-based contribution notes.
## S3 method for class 'pqa' summary(object, ...)## S3 method for class 'pqa' summary(object, ...)
object |
A |
... |
Further arguments passed to or from other methods. |
The summary output includes:
Absolute Frequencies: Contingency table with margins.
Chi-square Test: Test statistics, flag, and significance messages (when the test is considered valid/reliable).
Association Analysis: Cell-level signals for strong associations
(|q| > 30%) and row/column indifference patterns
(|q| < 10% for all cells in a row/column).
Apex Notes: "Odd" row/column contributions and the overall positive-vs-negative apex balance.
Invisibly returns the input pqa object.
pqa, print.pqa, print.pqa.subtable,
print.pqa.chisq
# Create a pqa from the built-in usa_voting_prefs dataset and get summary data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) # Get comprehensive summary summary(qt)# Create a pqa from the built-in usa_voting_prefs dataset and get summary data(usa_voting_prefs) qt <- pqa(usa_voting_prefs) # Get comprehensive summary summary(qt)
A cross-classified data table from the British Crime Survey (2007-2008) showing the relationship between the perceived frequency of rubbish on streets and crime victimization status. This dataset is useful for illustrating contingency table analysis and chi-square tests of independence in statistical education and research.
uk_crime_rubbishuk_crime_rubbish
An object of class table
with 4 rows (Rubbish on street categories) and 2 columns (Crime victimization status):
Very common: Rubbish on street is very common
Fairly common: Rubbish on street is fairly common
Not very common: Rubbish on street is not very common
Not at all common: Rubbish on street is not at all common
Not a victim of crime: Respondent was not a victim of crime
Victim of crime: Respondent was a victim of crime
Frequencies or counts of survey respondents (integer numbers).
BMRB Social Research and Home Office, Research, Development and Statistics Directorate (2022). British Crime Survey, 2007-2008 (data collection), 4th Edition. UK Data Service, SN: 6066. doi:10.5255/UKDA-SN-6066-2
# Load the dataset into the workspace data(uk_crime_rubbish) # Display the entire table print(uk_crime_rubbish) # Calculate marginal totals (row sums and column sums) rowSums(uk_crime_rubbish) colSums(uk_crime_rubbish) # Perform chi-square test of independence chisq.test(uk_crime_rubbish)# Load the dataset into the workspace data(uk_crime_rubbish) # Display the entire table print(uk_crime_rubbish) # Calculate marginal totals (row sums and column sums) rowSums(uk_crime_rubbish) colSums(uk_crime_rubbish) # Perform chi-square test of independence chisq.test(uk_crime_rubbish)
A dataset containing US mortality statistics by age group and gender, comparing 2020 deaths (including COVID-19 impact) with 2015-2019 averages. Includes all-cause deaths, non-COVID-19 deaths, and population data.
us_covid_mortalityus_covid_mortality
A data.frame with 22 rows (11 age groups × 2 genders) and 8 variables:
Character vector: age groups (<1, 1-4, 5-14, 15-24, 25-34, 35-44, 45-54, 55-64, 65-74, 75-84, 85+)
Character vector: "Male" or "Female"
Numeric: Total deaths in 2020
Numeric: Non-COVID-19 deaths in 2020
Numeric: Percentage of deaths attributed to COVID-19
Numeric: Population in 2020
Numeric: Average deaths for 2015-2019 period
Numeric: Average population for 2015-2019 period
Jacobson, Sheldon H. and Jokela, Janet A. (2021). Beyond COVID-19 Deaths during the COVID-19 Pandemic in the United States. Health Care Management Science, 24, 661-665. doi:10.1007/s10729-021-09570-4
# Load the dataset data(us_covid_mortality) # View the structure str(us_covid_mortality) # Summary statistics by gender aggregate(Deaths_2020 ~ Gender, data = us_covid_mortality, FUN = sum) # COVID-19 impact analysis us_covid_mortality$COVID_Impact <- with( us_covid_mortality, Deaths_2020 - Average_Deaths_2015_2019 ) summary(us_covid_mortality$COVID_Impact)# Load the dataset data(us_covid_mortality) # View the structure str(us_covid_mortality) # Summary statistics by gender aggregate(Deaths_2020 ~ Gender, data = us_covid_mortality, FUN = sum) # COVID-19 impact analysis us_covid_mortality$COVID_Impact <- with( us_covid_mortality, Deaths_2020 - Average_Deaths_2015_2019 ) summary(us_covid_mortality$COVID_Impact)
Cross-classified counts of US construction fall accidents by occupation and injury degree, derived from the 2000-2020 data analysis reported by Halabi et al. (2022). The table summarizes how fall accidents are distributed across 17 occupation groups and 3 injury-severity categories.
us_fall_accidentsus_fall_accidents
An object of class table with 17 rows (occupation groups) and
3 columns (injury degree):
Roofers, Construction laborers, Carpenters,
Laborers, except construction, Supervisors and Engineers,
Painters, plasterers, construction and maintenance,
Installers and Repairers, Structural metal workers,
Operators, Electricians, Truck drivers, heavy,
Technicians, Mechanics, Janitors and cleaners,
Helpers, construction trades, Installers (Drywalls, elevators),
Plumbers, Sales engineers, workers.
Fatality, Hospitalized, Non Hospitalized.
Frequencies (counts of fall accidents).
Totals in this dataset: 15,495 accidents overall, including 5,701 fatal cases, 8,955 hospitalized cases, and 839 non-hospitalized cases.
The table stored in the package uses abbreviated row names for compact display; the full occupation labels are listed above for readability.
The largest occupation groups by total number of accidents are
Roofers (2,967), Construction laborers (2,725), and
Carpenters (1,665).
Halabi, Y., Xu, H., Long, D., Chen, Y., Yu, Z., Alhaek, F., and Alhaddad, W. (2022). Causal Factors and Risk Assessment of Fall Accidents in the US Construction Industry: A Comprehensive Data Analysis (2000-2020). Safety Science, 146, 105537. Table 6 (p. 8), "Fall accidents distributed by occupation and injury degree". doi:10.1016/j.ssci.2021.105537
data(us_fall_accidents) us_fall_accidents rowSums(us_fall_accidents) colSums(us_fall_accidents) sum(us_fall_accidents)data(us_fall_accidents) us_fall_accidents rowSums(us_fall_accidents) colSums(us_fall_accidents) sum(us_fall_accidents)
A cross-classified data table from the National Survey of Children's Health showing school readiness for children aged 3-5 years by the highest level of education of an adult in the household. The table reports nationwide counts for children who are on track versus those who need support.
usa_toddlersusa_toddlers
An object of class table
with 4 rows (parent education categories) and 2 columns (school readiness):
Less than high school
High School/GED
College/Technical
College or more
On track
Need support
Frequencies or counts of children (integer numbers).
Totals in this dataset: 23,176 children overall, including 15,964 classified
as On track and 7,212 as Need support.
The largest parent-education group is College or more (15,718 children),
followed by College/Technical (4,388 children).
Data Resource Center for Child & Adolescent Health (2024). National Survey of Children's Health: School Readiness (Age 3-5 Years) by Parent Education. Nationwide tabulation based on the highest level of education of an adult in the household. https://www.childhealthdata.org/ Accessed 30 October 2024.
# Load the dataset into the workspace data(usa_toddlers) # Display the table print(usa_toddlers) # Calculate marginal totals rowSums(usa_toddlers) colSums(usa_toddlers) sum(usa_toddlers)# Load the dataset into the workspace data(usa_toddlers) # Display the table print(usa_toddlers) # Calculate marginal totals rowSums(usa_toddlers) colSums(usa_toddlers) sum(usa_toddlers)
A cross-classified data table presenting the voting preferences of USA residents classified by their income category, according to a survey by the Pew Research Center (2014). These data are typically used to illustrate computations and contingency analyses in statistical scenarios.
usa_voting_prefsusa_voting_prefs
An object of class table
with 4 rows (Income Categories) and 3 columns (Political Affiliations):
I: Less than $30,000
II: More than $30,000 but less than $50,000
III: More than $50,000 but less than $100,000
IV: $100,000 or more
R: Republican or leaning toward Republican
U: Undecided
D: Democrat or leaning toward Democrat
Frequencies or counts of respondents (integer numbers).
Pew Research Center (2014). Religious Landscape Study: Compare Party Affiliation by Income Distribution. https://www.pewresearch.org/religion/religious-landscape-study/compare/party-affiliation/by/income-distribution/ Accessed 08 July 2022.
# Load the dataset into the workspace data(usa_voting_prefs) # Display the entire table print(usa_voting_prefs) # Calculate marginal totals (row sums and column sums) rowSums(usa_voting_prefs) colSums(usa_voting_prefs)# Load the dataset into the workspace data(usa_voting_prefs) # Display the entire table print(usa_voting_prefs) # Calculate marginal totals (row sums and column sums) rowSums(usa_voting_prefs) colSums(usa_voting_prefs)