This repository contains the full pipeline for preparing, analyzing, and visualizing survey responses for the "Will Agents Replace Us?" preprint project. All scripts, outputs, and documentation are aligned with the final manuscript and reproducible environment.
scripts/01_prepare.py: Cleans and processes the raw survey data, expands JSON answers, extracts region info, and outputs tidy data files.scripts/02_explore_v2.py: Generates all exploratory figures (bar charts, grid, heatmap, mosaic) and saves summary statistics.scripts/03_infer_v2.py: Performs all inferential analyses (pairwise tests, MCA, K-Modes clustering, logistic regression), generates all results and manuscript figures.scripts/csv_to_md.py: Utility script to convert the K-Modes cluster table CSV (results/kmodes_table1.csv) to Markdown and LaTeX tables for manuscript inclusion.data/raw/: Place raw survey data files here (e.g.,survey_responses_rows_20250512.csv).data/: Output directory for cleaned data and extracted text.manuscript/figs/: All figures for the manuscript (auto-generated).results/: All result tables, JSONs, and model outputs (auto-generated).manuscript/article.tex: The main LaTeX manuscript referencing all canonical outputs.manuscript/article.pdf: The compiled PDF of the manuscript.
-
Install dependencies (Recommended: Conda)
This project requires Python 3.11 and the dependencies listed in
environment.yml:conda env create -f environment.yml conda activate ai_survey
-
Prepare data
Place your raw survey CSV file in the
data/raw/directory. By default, the script expects a file namedsurvey_responses_rows_20250512.csv. -
Run the preparation script
python scripts/01_prepare.py
This will generate:
data/clean_survey.parquet: Cleaned and expanded survey data.data/q11.txt: Free-text responses to question 11.
-
Run exploratory analysis
python scripts/02_explore_v2.py
This will generate:
- Individual bar charts for Q1-Q10:
manuscript/figs/Q1_barh.png, ...,manuscript/figs/Q10_barh.png - Grid of all questions:
manuscript/figs/all_questions_grid_improved.png - Heatmap:
manuscript/figs/all_questions_heatmap.png - Mosaic plot:
manuscript/figs/Q1xQ3_mosaic_supplementary_s1.png - Cramér's V heatmap:
manuscript/figs/cramers_v_heatmap_figure2.png - Proportion summary:
results/all_questions_results.json
- Individual bar charts for Q1-Q10:
-
Run inferential analysis
python scripts/03_infer_v2.py
This will generate:
- Pairwise test results:
results/pairwise_tests.csv,results/pairwise_summary.json,results/pairwise_matrices.json - MCA outputs:
results/mca_inertia.json,results/mca_row_coordinates.csv,manuscript/figs/mca_biplot_clusters_figure3.png - K-Modes clustering:
results/kmodes_results.json,results/kmodes_table1.csv,results/kmodes_table1.json,results/kmodes_elbow.png - Logistic regression:
results/logit_deployment_summary.txt,results/logit_deployment_coefs.csv,results/logit_deployment_vif.csv,manuscript/figs/logit_forest_plot_figure4.png,manuscript/figs/logit_forest_plot_significant_only.png
- Pairwise test results:
-
(Optional) Convert clustering results to Markdown/LaTeX
To generate Markdown and LaTeX tables from the K-Modes clustering results for manuscript inclusion:
python scripts/csv_to_md.py
All figures and tables referenced in manuscript/article.tex are generated by the above scripts and saved in manuscript/figs/ and results/. To fully reproduce the manuscript:
- Run all three main scripts in order as above.
- Compile
manuscript/article.texto PDF using your preferred LaTeX toolchain (e.g., pdflatex, latexmk, or an online LaTeX editor).
To generate a PDF of the manuscript, run the following command from the manuscript/ directory (after all scripts have been run):
pdflatex article.tex- The output will be saved as
manuscript/article.pdf. - This works both on your local machine and inside the Docker container.
- Note: Rendering to PDF requires a working LaTeX installation with the
tabularxandbooktabspackages. If you encounter errors likeEnvironment tabularx undefined, install a full TeX distribution (e.g., TeX Live, MacTeX, or TinyTeX). See Overleaf's documentation for details and troubleshooting.
- Region Extraction: The region is extracted from timezone metadata and is a coarse proxy (may be inaccurate due to VPNs, travel, or shared infrastructure). See manuscript Methods/Limitations for discussion.
- Hardcoded Cleaning: Merging of Q3 and Q5 categories is done via string matching in
scripts/01_prepare.py. This is brittle and not robust to changes in survey wording. For this analysis, it is acceptable, but future work should use a config or referencequestions.json. - Qualitative Analysis (Q11): Only a manual thematic summary is performed for Q11 (n=11). No further scripting or automation is included.
- Build the Docker image
docker build -t ai-agent-survey . - Run the container
docker run -it --rm -v "$PWD":/workspace ai-agent-survey - Run the scripts as above inside the container.
- All code is implemented as Python scripts (
.py), not Jupyter notebooks, in accordance with project rules. - The project workflow follows: specs review → data review → todo.md planning → script writing → manuscript writing. Each phase is logged for traceability.
- Code & Data: All Rights Reserved