Create structured training datasets with prompts, templates, and validation scripts for ML models.
What it does
HuggingFace datasets have a specific schema format, feature type system, and Hub upload workflow that Claude approximates incorrectly — using the wrong column types, missing required metadata fields, or generating upload code that fails silently because the dataset format isn't valid. This is HuggingFace's official dataset creation skill — covering correct schema definition, the datasets library format, validation before upload, and the dataset card format that makes the dataset discoverable. Made by HuggingFace.
Use case
Creating and publishing datasets on HuggingFace Hub — from raw data to a correctly structured, validated, and documented dataset that others can use for training and evaluation.
"Create a HuggingFace dataset from these text/label pairs for sentiment classification." "Convert this CSV into a correctly structured HuggingFace dataset and push to Hub." "Add train/validation/test splits to this dataset with the right proportions." "Generate a dataset card for this dataset with correct metadata." "Validate this dataset structure before I push it to Hub."
Provide the raw data and describe the task the dataset is for (classification, generation, etc.).
Claude defines the correct schema and feature types for the task.
Claude validates the dataset structure before generating the upload code.
Input
Raw data (CSV, JSON, text files) and a description of the ML task the dataset supports.
Output
A correctly structured HuggingFace dataset with proper schema definition, feature types, train/validation/test splits, and a dataset card with correct metadata for discoverability.
npx skillsadd huggingface/skills/hf-dataset-creator
Requires skills.sh CLI
Collection of scientific skills for working with specialized libraries, databases, and research workflows.
Orchestrate ML model evaluation jobs — benchmarks, metrics, reports, and comparison dashboards.
Analyze CSV files and generate insights with automatic visualizations, statistics, and data profiling.