HuggingFace Dataset Creator

HuggingFace Dataset Creator

Create structured training datasets with prompts, templates, and validation scripts for ML models.

Claude / Claude Code
GitHub Copilot
Cursor
VS Code
OpenAI Codex
Data AnalysisData AnalystResearcherDeveloper

What it does

HuggingFace datasets have a specific schema format, feature type system, and Hub upload workflow that Claude approximates incorrectly — using the wrong column types, missing required metadata fields, or generating upload code that fails silently because the dataset format isn't valid. This is HuggingFace's official dataset creation skill — covering correct schema definition, the datasets library format, validation before upload, and the dataset card format that makes the dataset discoverable. Made by HuggingFace.

Use case

Creating and publishing datasets on HuggingFace Hub — from raw data to a correctly structured, validated, and documented dataset that others can use for training and evaluation.

The Prompt

Copy and use immediately
"Create a HuggingFace dataset from these text/label pairs for sentiment classification."
"Convert this CSV into a correctly structured HuggingFace dataset and push to Hub."
"Add train/validation/test splits to this dataset with the right proportions."
"Generate a dataset card for this dataset with correct metadata."
"Validate this dataset structure before I push it to Hub."

How to use

  1. 1

    Provide the raw data and describe the task the dataset is for (classification, generation, etc.).

  2. 2

    Claude defines the correct schema and feature types for the task.

  3. 3

    Claude validates the dataset structure before generating the upload code.

Input / Output

Input

Raw data (CSV, JSON, text files) and a description of the ML task the dataset supports.

Output

A correctly structured HuggingFace dataset with proper schema definition, feature types, train/validation/test splits, and a dataset card with correct metadata for discoverability.

Added 15 Mar 2026Submitted by huggingface👁 45📋 0

Details

Platforms
Claude / Claude CodeGitHub CopilotCursorVS CodeOpenAI Codex
Category
Data Analysis
License
apache-2.0

Stats

📋 Copies0
👁 Views45
👍 Upvotes0

Install with skills.sh

npx skillsadd huggingface/skills/hf-dataset-creator

Requires skills.sh CLI

Community Notes

Sign in with GitHub to leave a note.

No notes yet. Be the first to contribute.