📊What Agent-Mode ChatGPT Can and Can’t Do for Open-Data Exploration🦌

13 Oct, 2025

日本語で読む

tl;dr

Goal: Visualize the drivers behind Japan’s sika deer increase (decline of hunters, warm winters / reduced snowfall, changes in culling pressure, etc.) in a dashboard to move the discussion forward.
Hypotheses to test: (1) Fewer/older hunters → weaker population control; (2) Warm winters / less snow → higher survival; (3) Shifts in culling composition → impact on abundance; (next) (4) Policy events & land-use change contribute to increases/decreases.
Data the AI found and proactively adopted: From the Environmental White Paper: estimated population (quantiles), cull counts (by category), number of hunting license holders (by age); plus daily snow depth → annual aggregates from representative stations (Nara / Niigata / Gifu); and supporting prefecture-level climate indicators (2019).
Key preprocessing: Convert legacy .XLS → CSV, convert Japanese era to Gregorian years, normalize headers, wide→long pivot, aggregate snow days / average / max / total snow depth annually. Everything exported as UTF-8 CSV for immediate Tableau import.
What Agent mode did well: Navigated to embedded Excel tables on government sites → converted → cleaned, bulk-fetched per-year CSVs, light ETL, and auto file output.
What Agent mode struggled with: Automated retrieval from login-gated/dynamic pages/old formats; comprehensive prefecture-level population tables still require extra work.

Author’s notes

If you first (also with AI) summarize the hypotheses you want to test and pass that as prompt context, you can then simply ask it to “bring good data,” and it will do a pretty decent job. Even with legacy .xls files, it went and found tools to convert to CSV—super helpful.
Sometimes, even if recent data exists, it returns only partial years because it couldn’t reach the latest source. Human verification and curation remain essential.
If you rely on the Agent, it will return data that you can drop into a BI tool and instantly graph (example Tableau view). That’s useful—but it also means you may end up with data that yields obvious conclusions rather than deep insights (which is still fine for a first pass).
As of October 2025, for open-data exploration with AI, it’s great to start even with a half-baked topic, then use an AI Agent in the scoping phase to see what datasets likely exist.

For plain-vanilla charts like this, it quickly pulls data that “just works” (though you’ll often find gaps).

Background

Motivation: Why are wild deer and boar increasing so much in Japan? → Preprocessing is a pain → Can ChatGPT handle it?

Plenty of folks want to do exploratory data analysis and hypothesis-driven visualization (in Tableau) when a question pops into their head. But searching, obtaining, and preprocessing public data is a slog (speaking from experience).

So I ran an experiment to see how far AI can take over preprocessing—and specifically what Agent-mode ChatGPT (with browser access, file handling, and light Python) can and can’t do.

“Tableau Uma-Uma Kai” (Yum-Yum Meetup)

This started with an event called Tableau Uma-Uma Kai—a very active community where we “build better viz + eat great food.” The theme this time was game meat BBQ. It was amazing.

A full game-meat tasting course

What is “gibier”?

(This section just pasted Perplexity search results.)

“Gibier” refers to meat from wild birds and mammals taken by hunting, and the dishes made from it. [1][2][5]

Etymology & definition

“Gibier” is French; in Japanese it’s “wild game meat.” [2][3]
Unlike livestock (beef/pork/chicken), it’s wild animals like deer, boar, duck, bear, rabbit, pheasant, etc. [6][8][1]

Characteristics

Because it’s hunted, exercise and diet affect flavor and texture. [1]
Japan has a tradition of eating gibier, and in recent years it’s also used in crop-damage control efforts. [3][9]
In Europe, especially France, it developed as a luxury, aristocratic cuisine. [5][6]

Common examples

Deer, boar, bear, rabbit, duck, pheasant, pigeon, etc. [7][8][1]
Dishes like botan-nabe (boar hotpot) and momiji-nabe (deer hotpot) are famous. [3][7]

Notes

Wild game can carry sanitary risks (e.g., hepatitis E, parasites); cook thoroughly. [2]

Thus gibier is a culinary tradition drawing on nature’s bounty, and today it’s gaining attention for sustainability and regional revitalization. [6][3]

References: 1 2 3 4 5 6 7 8 9 10

And with that delicious prelude, the question naturally arose: why are wild deer and boar populations increasing? Hence the experiment outlined above.

Experiment overview

To build an MVP (Tableau) dashboard that visually demonstrates key drivers of Japan’s sika deer increase (lack of predators, fewer hunters, warm winters/less snow, protection policies, land-use changes), we quickly collected and cleaned open data. This post is a field log of that exploration, and a summary of what Agent-mode ChatGPT (browser automation, file ops, light Python) could and couldn’t do.

The data-exploration plan the AI came up with

To craft a persuasive storyline fast, we gathered these MVP essentials:

Long-term trend of estimated population (nationwide, Honshu & south) Extract from the Environmental White Paper’s Excel tables → CSV
- Includes quantiles (0.05–0.95); use the median (Q0.50) as the main trend line.
Trend in cull counts (nationwide) Same White Paper Excel → CSV
- Breakout of hunting / permitted culling / designated projects + total → proxy for human pressure.
Number of hunting license holders (by age, long-term) White Paper Excel (age-by-year) → convert to long format; compute annual totals
- Directly shows decline & aging trends.
Snow & temperature (climate) (1) Prefecture-level “annual mean temperature” and “annual snow days” for 2019 (from a published ranking table) (2) As fixed-point snow monitoring, fetch daily snow depth from stations in high-deer prefectures, then aggregate annually (e.g., Nara, Niigata, Gifu)

With these, we have the four core axes—population / culling / hunters / snow—to assemble a minimal yet compelling dashboard.

The AI confidently says “we collected it,” but in reality (likely due to token limits) it sometimes couldn’t fetch everything in one go. Please forgive the occasional swagger.

The exploration log

1) Environmental White Paper Excel → CSV

Purpose: handle estimated population, culling, and hunter (by age) as annual time series
What we did:
- Navigated from the White Paper PDF table pages to embedded Excel files
- Read .xlsx directly; converted legacy .xls to CSV via headless LibreOffice
- Normalized Japanese headers, Japanese era → Gregorian, multi-row headers, and pivoted to long
Result: A set of clean CSVs you can drop into Tableau; line/area charts come together instantly.

2) Fixed-point snow data (since 1989; not necessarily all prefectures)

Purpose: test the warm-winter / less-snow hypothesis using the same station over time
What we did:
- Downloaded per-year CSVs (station × year)
- Built daily snow depth (mm) for Nara (Katsuragi), Niigata (Niigata), Gifu (Takayama)
- Aggregated snow days / avg / max / total snow depth annually
Take: Nara is warm; snow days are extremely rare—so to show “less snow → higher survival,” snowy regions have more explanatory power. Niigata’s year-to-year variability is great for storytelling.

Where we stumbled (and how we worked around it)

Legacy .xls reading → Not handled out-of-the-box by Python; solved with LibreOffice CSV conversion.
Multi-row headers, Japanese eras, full-width spaces → Normalize column names, map era→Gregorian, pivot to long.
Cross-domain restrictions / login requirements → Some portals limit automated downloads. We aimed for primary Excel/CSV when possible; otherwise switched to alternative open sources or fixed-point monitoring.
Prefer station fixed-points over prefectural averages → Prefecture averages can be spotty by year/metric; station×year gives long, reproducible series.

Preprocessing recipe (what we actually did)

Tidy column names (JP→EN, remove spaces/linebreaks/full-width chars)
Convert Japanese era → Gregorian via a mapping table
Pivot wide→long
Force strings→numeric; treat missing as NaN or 0 by context
Annual aggregates: snow days = snow_depth_mm > 0, plus avg/max/total snow depth
Export: UTF-8 CSV (works across Tableau/Sheets/Python)

Minimal storyboard in Tableau

“Minimal” because we’re discussing an MVP in the prior context.

Estimated population (Q0.50) trend: national time-series line
Culling stack: stacked bars for hunting / permitted / designated projects
Aging of hunters: area chart or population pyramid by age band
Fixed-point snow comparison: annual snow-day counts—Niigata (snowy) vs Nara (warm)
- Highlight periods where warm/low-snow years overlap with population increase
(Optional) Annotation layer: vertical reference lines for policy shifts (e.g., end of doe-hunting bans)

Agent-mode ChatGPT: What it did well

Exploration & reachability on official sites
- Navigated to embedded Excel in the White Paper
- Devised a way to bulk-fetch per-year station CSVs
File conversion / cleaning / joins
- .xls→CSV, read .xlsx, fix mixed JP headers
- Light ETL: long-formatting, annual aggregation
“Ready-to-use” output
- UTF-8 CSV works in Tableau/Sheets/Python
- Column names & types are BI-friendly

If your goal is to get to a usable state quickly, it’s an excellent fit.

Agent-mode ChatGPT: What it struggled with / caveats

Login-gated / API-keyed sites & dynamic pages
- Logins, cross-domain rules, and cookies are blockers.
- Workarounds: go straight to primary Excel/CSV; if not possible, narrow to representative stations or switch to another primary source.
Legacy .xls & Japanese-specific header quirks
- Without extra libs, it can get stuck.
- Workarounds: headless LibreOffice conversion; skip rows + regex to clean.
Full coverage under tight time
- E.g., clean, standardized prefecture-level deer population tables aren’t an instant fetch.
- Approach: ship a convincing MVP (national trend + culling + hunters + fixed-point snow), then expand regionally.
Unit / definition mismatch
- Snow metrics vary (snowfall vs. snow depth vs. snow days) and units (cm/mm/days).
- Approach: state definitions explicitly; encode units in column names (e.g., snow_depth_mm).

Reproducible workflow (make it your team standard)

Acquisition
- White Paper Excel → /raw/env_whitepaper/*.xlsx|.xls
- Station×year CSV → /raw/snow/{station}/{year}.csv.gz
Light ETL
- Convert (.xls→csv), normalize headers, era→Gregorian, numeric casting
- Wide→long; ensure keys (year / prefecture / station) align
Features
- Snow days; avg/max/total snow depth
- Hunter age-band shares; median age (optional)
Validation views
- Lines (population, culls), stacked bars (cull composition), pyramid (hunters), snow comparison
Export
- /out/csv/*.csv → import to Tableau; bind to a workbook template

Make it even better (practical tips)

Gradually add (representative station × multiple prefectures) for snow (Hokkaido, Aomori, Akita, Niigata, Nagano, Gifu, etc.).
Create a policy-event timeline (end of doe-hunting bans, expansion of control programs) and overlay vertical markers to propose causal hypotheses.
Add land-use proxies (abandoned farmland, forest age structure, broadleaf share)—even 1–2 indicators help.
Build a data dictionary early (column names, units, definitions, source URLs, refresh cadence).

Conclusion

Agent-mode ChatGPT excels at getting to primary data it can reach, cleaning it fast, and putting it on BI rails.
It struggles with login-gated APIs, heavily dynamic sites, and legacy formats—use workarounds (alt sources, representative stations, format conversion, a small manual step).
Even just the combo here—national trend + culls + hunters + fixed-point snow—is a strong starting point to illustrate “warm winters / less snow × population dynamics.” From here, layer in regional detail and policy events to evolve a decision-worthy dashboard.

Appendix 1: Hypotheses and the data sought/obtained

Hypothesis	Aim	Data sought/obtained (source)	Status / notes
Fewer/older hunters weaken control capacity	Track long-term hunter counts & age structure	Hunting license holders (by age, national) (Environmental White Paper Excel)	Obtained & cleaned (long-format, annual totals)
Climate change (warm winters / less snow) raises overwinter survival	Check if less snow broadens “survivable conditions”	Daily snow depth since 1989 (stations): Nara (Katsuragi), Niigata (Niigata), Gifu (Takayama) → annual aggregates (snow days / avg / max / total)	Obtained & aggregated (Nara = rare snow; Niigata = big year-to-year swings)
Population increased since the 1990s	Visualize long-term rise	Estimated population (Honshu & south; quantiles 0.05–0.95) (White Paper Excel)	Obtained, CSV; Q0.50 (median) as main series
Changes in culling composition affect abundance	Understand composition & totals	Deer culls (hunting / permitted / designated projects / total) (White Paper Excel)	Obtained, CSV; stacked annual viz
(Supporting) Warm-winter tendency at prefecture scale	Align station data with prefecture scale	Prefecture-level: annual mean temp & snow days (2019) (published rankings)	Obtained (2019 aligned across indicators)
(Next) Policy / legal changes contribute to trends	Overlay events with dynamics	Timeline: end of doe bans, expansion of damage-control programs, etc.	In progress (MVP lists candidate events)
(Next) Habitat change (abandoned farmland, forest type)	Test resource/cover expansion	Abandoned farmland area trend (national/prefectural), forest statistics	To be added (currently a candidate list)
(Next) Prefecture-level population / culls	Strengthen regional explanatory power	Prefecture-level culls / slaughter counts; population estimates	Some access constraints → phase in later

Note: Wolves as “natural predators” are handled as historical qualitative context; quantitative comparison isn’t feasible.

Appendix 2: Main tools/features used in Agent mode (this run)

Acquisition

Browser automation: navigated from White Paper tables to embedded Excel and downloaded
Bulk per-year CSVs (station×year): fetched annual snow-depth archives (Nara/Niigata/Gifu)

Conversion / preprocessing

Headless LibreOffice: legacy .xls → CSV batch conversion (.xlsx read directly)
Python (pandas / csv / gzip):
- Normalize headers (remove JP newlines/full-width spaces)
- Era→Gregorian conversion
- Wide→long reshaping; numeric casting (contextual 0/NaN)
- Daily snow → annual aggregates (snow days, avg/max/total)
Minimal shell tasks: organize downloads and file layout (/raw→/out)

Validation / viz-prep

Export as UTF-8 CSV (Tableau/Google Sheets/Python-friendly)
Tableau-ready schema (date/year keys, long format, explicit units like snow_depth_mm)

Constraints & workarounds

Login/cookie-dependent portals & strong cross-domain restrictions → go to primary Excel/CSV directly, or pivot to representative stations
Legacy .xls & multi-row headers → LibreOffice conversion + normalization
Prefecture-level climate metrics may have holes → use long fixed-point series as the backbone