Metadata Quality Audit of the Indonesia One Data Portal
Hands-on audit of 200 datasets from data.go.id across the 20 largest ministries/agencies · May 5, 2026.
- Authored by
- Orionex Research Team
- Publication date
- May 5, 2026
- Reading time
- ~8 min
- Reference
- Annex to the AI Incubation for Public Sector proposal
Findings at a Glance
Four headline numbers that capture the scale and urgency of metadata quality issues on the Indonesia One Data portal.
Top 20 Ministries / Agencies by Dataset Count
All 20 ministries and agencies with the most datasets fall in the POOR classification, with a completeness score of 0.000 across the entire sample.
Bars represent each agency's published dataset count (Walidata filter on the portal). Red marks the POOR classification across the entire sample; none reached PARTIAL or COMPLETE.
17 SDI Fields · 0% Filled
Not a single one of the 17 Satu Data Indonesia (SDI) fields is filled across the 200 audited datasets. Total: 0 of 3,400 slots.
Three Headline Findings
Patterns that hold consistently across all 20 audited ministries and agencies.
Not one of the 200 datasets has any SDI metadata filled
Across the entire sample of the 20 largest ministries/agencies the average completeness score is 0.000, with 0 out of 3,400 SDI field slots filled. Every dataset displays the portal's own SDI non-compliance warning banner.
Every dataset within an agency shares the same publish date
Uniform publish dates indicate bulk uploads with zero metadata enrichment. Examples: all 10 Kabupaten Demak datasets published 2023-05-22; all 10 Provinsi Bali datasets published 2023-07-24.
Most datasets do not include a downloadable file
Only 24% of datasets ship a data file (XLSX, JSON, SHP, CSV, XLS). The remaining 76% only show a "Request Data" button, so the portal acts as an empty metadata catalog rather than an actual data repository.
AutoInsight by Orionex — AI metadata enrichment for the Indonesia One Data portal
Filling 9,525,151 empty metadata slots manually is not feasible. Extrapolating from this sample yields 420,227 analyst-hours and IDR 168.1 billion in labor cost.
AutoInsight is the Orionex AI platform that automatically extracts, infers, and fills the 17 SDI fields based on dataset titles, document content, and the publishing agency context. Projected average accuracy is 81% (≥ 14 of 17 fields), so analysts only review the AI draft for 2 minutes per dataset.
This report is authored by the Orionex Research Team as an annex to the AI Incubation for Public Sector proposal aimed at strengthening data quality on the Indonesia One Data portal, with AutoInsight as the proposed solution.
Impact: With vs Without AutoInsight
Projected impact applied to all 560,303 datasets on data.go.id.
| Metric | Without AutoInsight | With AutoInsight |
|---|---|---|
| Time per dataset | 45 min | 2 min (AI draft review) |
| Total analyst-hours | 420,227 hrs | 18,677 hrs |
| Total cost | IDR 168.1 B | IDR 7.5 B |
| Completion time (50 analysts) | 4.8 years | 1.9 months |
| Completeness score | 0.00 | ≥ 0.81 |
| % datasets POOR | 100% | < 5% |
Regional Context
Indonesia operates the largest open-data portal in the region, yet metadata completeness sits near zero; the largest gap and the largest transformation opportunity.
Methodology
Technical notes and known limitations of this audit.
- Sampling methodFirst 10 datasets on each agency's default listing page, for the 20 ministries/agencies with the highest dataset count under the Walidata filter.
- Parser & data sourceNext.js SSR HTML from data.go.id, parsed via Playwright MCP and WebFetch. Metadata fields detected via the DOM pattern div.font-bold (label) + div.overflow-hidden (value).
- Sample size200 datasets (20 agencies × 10 datasets each). Audit date: May 5, 2026.
- Completeness score definitionCompleteness score = number of filled SDI fields / 17. Classification: POOR < 0.40 · PARTIAL 0.40–0.79 · COMPLETE ≥ 0.80.
- LimitationsA 10-dataset sample per agency may not represent the full portfolio; subsequent listing pages were not audited. Cross-validation shows the 100% POOR pattern is consistent across all government tiers (province, city, regency), indicating a systemic finding rather than sampling bias.
Raw Audit Data
200 records from the dataset inspection, filterable by ministry/agency or searchable by title.
This report is authored by the Orionex Research Team as an annex to the AI Incubation for Public Sector proposal for the Indonesia One Data portal (data.go.id). Proposed solution: AutoInsight by PT Orionex Solusi Digital.
Audit date: May 5, 2026 · Source: data.go.id (Indonesia One Data Portal).