Add datasets with proof
DefendableDatasets accepts registry-first contributions. Contributors add metadata, receipts, samples, and manifests through pull requests; large files move through NAS or object storage.
Required Metadata
- id, title, slug, description, domain, category, version
- status, access, license, formats, tasks, language
- record_count, size_bytes, created_at, updated_at
- source_type, provenance_summary, intended_use, not_intended_use
- quality_score, validation, files, hashes, receipts
- tags, compatible_models, example_records, citation, links
License and Provenance
Every dataset needs a license, source summary, receipt path, and SHA256 hashes for every file. Public records still need source URLs, retrieval dates, and terms review. Private or member data must stay out of public pull requests and route through gated access controls.
Dataset Folder Structure
datasets/
cre/
cre_underwriting_royal_jelly_v1/
dataset.card.md
manifest.json
samples/
receipts/
splits/
compute/
compute_gpu_market_comps_v1/
dataset.card.md
manifest.json
samples/
receipts/
splits/Example PR Checklist
- Add or update `/data/registry/datasets.json`.
- Add `datasets/[domain]/[dataset_id]/manifest.json` and `dataset.card.md`.
- Add small samples under `samples/` and receipt files under `receipts/`.
- Put real train, validation, test, or full split files under `splits/` only when licensing permits.
- Run `npm run lint` and `npm run build` before opening the pull request.
CLI Roadmap
Available commands: `defendable-datasets validate`, `defendable-datasets hash`, and `defendable-datasets pack`. Next command: `defendable-datasets receipt`.