Fine-tuning LLaMA 3 on Nepali Text

Nepali is spoken by over 17 million people, yet it remains severely underrepresented in large language models. While LLaMA 3, Mistral, and GPT-4 have some multilingual capability, their Nepali performance lags significantly behind English — especially for formal written Nepali, technical terminology, and culturally specific contexts. Fine-tuning a LLaMA 3 model on Nepali text changes this.

This guide covers the complete process: understanding when fine-tuning is the right approach, setting up the QLoRA training pipeline, preparing Nepali datasets, running training on a GPU, and pushing your model to HuggingFace Hub so the community can benefit. This is a technically advanced tutorial — if you're new to LLMs, you may want to start with the RAG chatbot guide first.

When to Fine-tune

When Should You Fine-tune vs RAG vs Prompt Engineering?

This is the most important question in applied LLM work. Fine-tuning is expensive, time-consuming, and often unnecessary. Before starting, work through this decision tree:

Fine-tune vs RAG vs Prompting — Decision Tree

🤔 Can a well-crafted system prompt solve your problem?

YES →

Prompt Engineering

Free, instant, no infra. Use few-shot examples, chain-of-thought. Best first option.

NO ↓

Is your data dynamic or frequently updated?

YES →

RAG (Retrieval Augmented Generation)

Best for Q&A over documents, knowledge bases, live data. Update without retraining.

NO ↓

Do you need custom style, language, or task format?

YES →

Fine-tuning (This Guide!)

Custom language (Nepali!), specific domain tone, structured output format, or when base model is weak on your task.

QLoRA — How It Works

QLoRA: Efficient Fine-tuning with Low-Rank Adapters

Fine-tuning a full 7B parameter LLaMA model requires updating 28GB+ of weights — needing 80GB+ VRAM. QLoRA (Quantized Low-Rank Adaptation) makes this practical on a single 16–24GB GPU by combining two innovations: 4-bit quantization (reduces memory by 4x) and LoRA adapters (only trains a tiny fraction of parameters).

QLoRA Architecture — Frozen Weights + Trainable Adapters

🔒 Original Weight Matrix W

d × k dimensions (e.g. 4096 × 4096)

4-bit quantized NF4 format

~2GB per layer (vs ~8GB for FP16)

❄️ FROZEN — not updated during training

🔓 LoRA Adapters (A, B)

Matrix A

d × r

rank r = 16

Matrix B

r × k

rank r = 16

Params: 2 × 4096 × 16 = 131K per layer

✅ TRAINABLE — only these get updated!

↓

Forward Pass: W' = W + BA (scaled by α/r)

Only 0.1–1% of parameters are trained. Full model: 6.7B params. LoRA adapters: ~8M params.

LoRA Mathematics

The key insight behind LoRA is that the weight updates during fine-tuning have a low intrinsic rank — meaning they can be well approximated by a product of two small matrices, even though the full weight matrix is huge.

W' = W + BA, where B ∈ ℝ^(d×r), A ∈ ℝ^(r×k), rank r ≪ d,k

W is the frozen pre-trained weight (d×k). B and A are the trainable LoRA matrices with rank r (typically 4–64). The product BA has rank at most r, so it captures a low-dimensional update to W. The scaling factor α/r controls the magnitude of the update. For d=k=4096 and r=16, LoRA adds just 131K params vs 16.7M for the full matrix — 127x reduction.

Step-by-Step Training

Step 1: Install Dependencies

bash

# Install all required packages for QLoRA fine-tuning
pip install transformers>=4.40.0 \
            peft>=0.10.0 \
            datasets>=2.18.0 \
            accelerate>=0.28.0 \
            bitsandbytes>=0.43.0 \
            trl>=0.8.0 \
            huggingface_hub \
            wandb \
            scipy

# Log in to HuggingFace (to access LLaMA 3 — requires approval from Meta)
huggingface-cli login

# Log in to W&B for experiment tracking (optional but recommended)
wandb login

# Check GPU memory
python -c "import torch; print(f'GPU: {torch.cuda.get_device_name(0)}'); print(f'VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')"

⚠️GPU Requirements — Minimum Specs

LLaMA 3 8B with QLoRA (4-bit): Minimum 16GB VRAM (RTX 3090, RTX 4080, A100)
LLaMA 3 70B with QLoRA: Minimum 48GB VRAM — requires multi-GPU or A100/H100
Inference only (4-bit): 8B model fits in 8GB VRAM (RTX 3070, RTX 4060 Ti)
Nepal context: Most Nepali engineers won't have local GPUs — use Google Colab A100, Kaggle (30hr free GPU), or RunPod ($0.40/hr for RTX 3090)

💡Use Google Colab A100 for Free

HuggingFace Zero GPU on Spaces gives free A100 access for inference. For training, Colab Pro gives A100 access. A typical QLoRA fine-tuning run on a 10k sample Nepali dataset takes about 2–3 hours on an A100 — well within Colab Pro's limits.

Alternatively, use Kaggle Notebooks (30 hours/week free T4 GPU) for smaller experiments, then scale to Colab for full training runs.

Step 2: Load Model in 4-bit Quantization

python

# load_model.py — Load LLaMA 3 with 4-bit quantization (QLoRA setup)
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)
from huggingface_hub import login
import os

# ── HuggingFace authentication ──
# LLaMA 3 requires accepting Meta's license on hf.co/meta-llama
login(token=os.environ.get("HF_TOKEN"))

MODEL_ID = "meta-llama/Meta-Llama-3-8B"  # or Meta-Llama-3-8B-Instruct

# ── 4-bit quantization configuration ──
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,                      # Enable 4-bit loading
    bnb_4bit_quant_type="nf4",             # NormalFloat4 — best for normal distributions
    bnb_4bit_compute_dtype=torch.bfloat16,  # Compute in bf16 (numerically stable)
    bnb_4bit_use_double_quant=True,         # Double quantisation saves extra ~0.4 bits/param
)

print(f"Loading {MODEL_ID} in 4-bit...")
print(f"Expected VRAM: ~5GB for 8B model in 4-bit")

# ── Load model ──
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",           # automatically distribute across available GPUs
    torch_dtype=torch.bfloat16,
    trust_remote_code=False,
)

# ── Load tokenizer ──
tokenizer = AutoTokenizer.from_pretrained(
    MODEL_ID,
    trust_remote_code=False,
    padding_side="right",  # right padding is needed for SFTTrainer
)

# LLaMA 3 doesn't have a pad token by default
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.pad_token_id = tokenizer.eos_token_id
    model.config.pad_token_id = tokenizer.pad_token_id

# Check memory usage
print(f"Model loaded. GPU memory used: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
print(f"Model parameters: {model.num_parameters():,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

Step 3: Prepare the Nepali Dataset

python

# prepare_dataset.py — Format Nepali training data for SFT
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer
import json
import re

TOKENIZER_ID = "meta-llama/Meta-Llama-3-8B"
MAX_LENGTH = 2048

# ── Conversation format for LLaMA 3 ──
# LLaMA 3 uses a specific chat template with special tokens
def format_nepali_example(example: dict) -> dict:
    """
    Format training examples as LLaMA 3 chat format.
    Input: {"instruction": "...", "input": "...", "output": "..."}
    Output: {"text": "<|begin_of_text|><|start_header_id|>system..."}
    """
    system_prompt = (
        "तपाईं एक सहायक हुनुहुन्छ जो नेपाली भाषामा राम्रोसँग जवाफ दिन्छ। "
        "सधैं स्पष्ट, सटीक र उपयोगी जानकारी प्रदान गर्नुहोस्।"
    )
    # English: "You are an assistant who responds well in Nepali language.
    # Always provide clear, accurate and useful information."

    instruction = example.get("instruction", "")
    context    = example.get("input", "")
    output     = example.get("output", "")

    # Build user message
    user_message = instruction
    if context:
        user_message = f"{instruction}

सन्दर्भ: {context}"

    # LLaMA 3 chat format
    text = (
        f"<|begin_of_text|>"
        f"<|start_header_id|>system<|end_header_id|>

"
        f"{system_prompt}<|eot_id|>"
        f"<|start_header_id|>user<|end_header_id|>

"
        f"{user_message}<|eot_id|>"
        f"<|start_header_id|>assistant<|end_header_id|>

"
        f"{output}<|eot_id|>"
    )

    return {"text": text}


def load_nepali_dataset(data_path: str) -> Dataset:
    """
    Load and format a Nepali instruction dataset.
    Expected format: JSONL with instruction/input/output fields
    """
    # Option 1: Load from HuggingFace Hub
    if data_path.startswith("hf://") or "/" in data_path and not data_path.startswith("/"):
        dataset = load_dataset(data_path, split="train")
        print(f"Loaded {len(dataset)} examples from HuggingFace")

    # Option 2: Load from local JSONL
    else:
        examples = []
        with open(data_path, "r", encoding="utf-8") as f:
            for line in f:
                if line.strip():
                    examples.append(json.loads(line))
        dataset = Dataset.from_list(examples)
        print(f"Loaded {len(dataset)} examples from {data_path}")

    # Format all examples
    dataset = dataset.map(
        format_nepali_example,
        remove_columns=dataset.column_names,
        desc="Formatting examples",
    )

    # Filter by length
    tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_ID)
    def filter_by_length(example):
        tokens = tokenizer(example["text"], return_length=True)
        return tokens["length"][0] <= MAX_LENGTH

    dataset = dataset.filter(filter_by_length, desc="Filtering by length")
    print(f"After filtering: {len(dataset)} examples")

    # Train/val split
    split = dataset.train_test_split(test_size=0.05, seed=42)
    print(f"Train: {len(split['train'])} | Val: {len(split['test'])}")

    return split


# ── Show sample ──
if __name__ == "__main__":
    # Replace with your actual dataset path
    dataset = load_nepali_dataset("Shushant/nepali-alpaca")
    print("
Sample training example:")
    print(dataset["train"][0]["text"][:500])

Step 4: Configure LoRA

python

# configure_lora.py — Set up PEFT LoRA adapters
from peft import LoraConfig, TaskType, get_peft_model, prepare_model_for_kbit_training
import torch

def configure_lora(model) -> object:
    """
    Apply LoRA adapters to LLaMA 3 for parameter-efficient fine-tuning.
    Only the LoRA adapter weights (A and B matrices) will be trained.
    """

    # ── Prepare model for k-bit training ──
    # This enables gradient checkpointing and handles dtype casting
    model = prepare_model_for_kbit_training(
        model,
        use_gradient_checkpointing=True,  # saves memory, trains slower
    )

    # ── LoRA Configuration ──
    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,  # language modelling objective

        # Rank of LoRA matrices (higher = more params = better but slower)
        # r=16 is a good default; try r=8 for less data, r=32 for more
        r=16,

        # LoRA scaling factor: effective update = (alpha/r) * BA
        # alpha=32 with r=16 gives scaling of 2.0
        lora_alpha=32,

        # Which weight matrices to apply LoRA to
        # For LLaMA: attention projections + MLP gates
        target_modules=[
            "q_proj",    # query projection
            "k_proj",    # key projection
            "v_proj",    # value projection
            "o_proj",    # output projection
            "gate_proj", # MLP gate
            "up_proj",   # MLP up
            "down_proj", # MLP down
        ],

        # Dropout on LoRA layers (regularisation)
        lora_dropout=0.05,

        # Don't update biases (saves params)
        bias="none",

        # Inference mode is off during training
        inference_mode=False,
    )

    # ── Apply LoRA to model ──
    model = get_peft_model(model, lora_config)

    # ── Print trainable parameter count ──
    total_params    = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    pct = 100 * trainable_params / total_params

    print(f"Total parameters:     {total_params:,}")
    print(f"Trainable parameters: {trainable_params:,} ({pct:.3f}%)")
    print(f"Frozen parameters:    {total_params - trainable_params:,}")

    return model

Step 5: Full Training with SFTTrainer

python

# train_nepali_llama.py — Complete QLoRA training script
import torch
import wandb
from transformers import TrainingArguments
from trl import SFTTrainer, SFTConfig, DataCollatorForCompletionOnlyLM
from load_model import model, tokenizer
from configure_lora import configure_lora
from prepare_dataset import load_nepali_dataset
import os

# ── Weights & Biases tracking ──
wandb.init(
    project="nepali-llama3-finetune",
    name="qlora-8b-nepali-v1",
    config={
        "model": "Meta-Llama-3-8B",
        "method": "QLoRA",
        "language": "Nepali",
        "lora_r": 16,
        "lora_alpha": 32,
    }
)

# ── Apply LoRA ──
model = configure_lora(model)

# ── Load dataset ──
dataset = load_nepali_dataset("Shushant/nepali-alpaca")

# ── Training configuration ──
training_args = SFTConfig(
    output_dir="./outputs/nepali-llama3-qlora",

    # Training duration
    num_train_epochs=3,
    max_steps=-1,   # -1 = use num_train_epochs

    # Batch size (micro-batching for gradient accumulation)
    per_device_train_batch_size=2,    # 2 examples per GPU per step
    gradient_accumulation_steps=8,   # effective batch = 2×8 = 16
    per_device_eval_batch_size=4,

    # Optimiser
    optim="paged_adamw_8bit",         # memory-efficient 8-bit AdamW
    learning_rate=2e-4,
    weight_decay=0.001,
    max_grad_norm=0.3,                 # gradient clipping

    # Learning rate schedule
    warmup_ratio=0.03,                 # 3% of steps for warmup
    lr_scheduler_type="cosine",       # cosine decay

    # Precision
    fp16=False,
    bf16=True,                         # bfloat16 (requires Ampere GPU)

    # Sequence length
    max_seq_length=2048,

    # Evaluation & checkpointing
    eval_strategy="steps",
    eval_steps=100,
    save_strategy="steps",
    save_steps=200,
    save_total_limit=3,               # keep only 3 latest checkpoints
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",

    # Logging
    logging_dir="./logs",
    logging_steps=25,
    report_to="wandb",

    # Packing short sequences for efficiency
    packing=False,   # set True if most examples are short

    # Dataset formatting
    dataset_text_field="text",
)

# ── Initialise trainer ──
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
)

# ── Start training ──
print("Starting QLoRA fine-tuning of LLaMA 3 on Nepali text...")
print(f"Training examples: {len(dataset['train'])}")
print(f"Validation examples: {len(dataset['test'])}")
print(f"Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print()

trainer.train()

# ── Save the final model ──
print("
Saving fine-tuned model...")
trainer.save_model("./outputs/nepali-llama3-qlora/final")
tokenizer.save_pretrained("./outputs/nepali-llama3-qlora/final")

wandb.finish()
print("Training complete!")

Training Hyperparameter Reference

Hyperparameter	Value Used	Range	Effect
LoRA rank (r)	16	4–64	Higher = more capacity, more VRAM, slower
LoRA alpha (α)	32	8–128	α/r scaling factor; 2.0 is a good default
LoRA dropout	0.05	0–0.1	Regularisation; 0 if you have lots of data
Learning rate	2e-4	1e-5 – 5e-4	Higher = faster but can diverge
Batch size (effective)	16	8–64	Larger = more stable gradients, needs more memory
Epochs	3	1–5	More data → fewer epochs needed
Warmup ratio	0.03	0.01–0.05	% of steps for LR warmup
LR scheduler	cosine	linear/cosine/constant	Cosine decay tends to give cleaner convergence
Quantization	NF4 4-bit	4-bit/8-bit	4-bit saves most memory; 8-bit is more stable
Max sequence length	2048	512–4096	Longer = better for long texts, needs more VRAM

Training Loss — Expected Progress

Training Loss Curve — Nepali LLaMA 3 QLoRA

Loss decreasing from ~2.8 → ~0.9 over 1,000 steps

3.0

2.5

1.5

1.0

0.5

Train loss

Val loss

Step 0–100:
Rapid loss drop — model adapts to Nepali script and instruction format

Step 100–500:
Gradual improvement — learning vocabulary, grammar patterns

Step 500–1000:
Fine-grained tuning — cultural context, idiomatic expressions

Step 6: Run Inference

python

# inference.py — Run inference with the fine-tuned Nepali model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel

BASE_MODEL_ID = "meta-llama/Meta-Llama-3-8B"
ADAPTER_PATH  = "./outputs/nepali-llama3-qlora/final"

# ── Option 1: Load base model + LoRA adapters separately ──
# (Useful during development — can swap adapters easily)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)

# ── Option 2: Merge LoRA into base model (faster inference) ──
# merged_model = model.merge_and_unload()
# merged_model.save_pretrained("./merged_model")


def generate_nepali(
    instruction: str,
    context: str = "",
    max_new_tokens: int = 512,
    temperature: float = 0.7,
    top_p: float = 0.9,
) -> str:
    """Generate a Nepali response for a given instruction."""

    system = (
        "तपाईं एक सहायक हुनुहुन्छ जो नेपाली भाषामा राम्रोसँग जवाफ दिन्छ।"
    )
    user_msg = f"{instruction}

सन्दर्भ: {context}" if context else instruction

    prompt = (
        f"<|begin_of_text|>"
        f"<|start_header_id|>system<|end_header_id|>

{system}<|eot_id|>"
        f"<|start_header_id|>user<|end_header_id|>

{user_msg}<|eot_id|>"
        f"<|start_header_id|>assistant<|end_header_id|>

"
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
            repetition_penalty=1.1,
        )

    # Decode only the new tokens (not the prompt)
    new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
    response   = tokenizer.decode(new_tokens, skip_special_tokens=True)
    return response.strip()


# ── Test examples ──
examples = [
    {
        "instruction": "नेपालको राजधानी सहरको बारेमा बताउनुहोस्।",
        "description": "Simple factual question about Nepal"
    },
    {
        "instruction": "मेसिन लर्निङ भनेको के हो? सरल भाषामा बुझाउनुहोस्।",
        "description": "Explain machine learning in Nepali"
    },
    {
        "instruction": "एउटा Python कोड लेख्नुहोस् जसले नेपाली वर्णमाला प्रिन्ट गर्छ।",
        "description": "Code generation in Nepali"
    },
]

print("Testing fine-tuned Nepali LLaMA 3:
")
for ex in examples:
    print(f"Task: {ex['description']}")
    print(f"Instruction: {ex['instruction']}")
    response = generate_nepali(ex["instruction"])
    print(f"Response: {response}")
    print("-" * 60 + "
")

Step 7: Push to HuggingFace Hub

python

# push_to_hub.py — Share your fine-tuned model with the community
from huggingface_hub import HfApi, login
from transformers import AutoTokenizer
from peft import PeftModel
import torch

# -- Authentication --
login(token="your_hf_token_here")

REPO_ID = "hexcodenepal/llama3-8b-nepali-qlora"  # your HuggingFace username/model-name

# -- Option A: Push LoRA adapter only (small, fast) --
# Others can use it with the base model
model.push_to_hub(
    REPO_ID,
    private=False,      # True if you don't want public access
    safe_serialization=True,
)
tokenizer.push_to_hub(REPO_ID)

# -- Option B: Push merged model (larger but no dependency on base) --
print("Merging LoRA adapters into base model...")
merged = model.merge_and_unload()
merged.save_pretrained("./merged_llama3_nepali", safe_serialization=True)
tokenizer.save_pretrained("./merged_llama3_nepali")

api = HfApi()
api.create_repo(repo_id=REPO_ID + "-merged", private=False, exist_ok=True)
api.upload_folder(
    folder_path="./merged_llama3_nepali",
    repo_id=REPO_ID + "-merged",
    repo_type="model",
)

# -- Create Model Card --
model_card = (
    "---\n"
    "language: ne\n"
    "license: llama3\n"
    "base_model: meta-llama/Meta-Llama-3-8B\n"
    "tags:\n"
    "  - llama3\n"
    "  - nepali\n"
    "  - qlora\n"
    "---\n\n"
    "# LLaMA 3 8B Fine-tuned on Nepali Text\n\n"
    "QLoRA fine-tuned version of Meta-Llama-3-8B on Nepali instruction data.\n\n"
    "## Training Details\n"
    "- Base model: meta-llama/Meta-Llama-3-8B\n"
    "- Method: QLoRA (4-bit NF4 + LoRA rank 16)\n"
    "- Dataset: Nepali Alpaca + custom data\n"
    "- Training time: ~3 hours on 1x A100 40GB\n\n"
    "Developed by HexCode Nepal.\n"
)

with open("./merged_llama3_nepali/README.md", "w") as f:
    f.write(model_card)

print(f"Model pushed to: https://huggingface.co/{REPO_ID}")

Nepali Datasets

Nepali Dataset Sources

Dataset	Source	Size	Type	License
Nepali Alpaca	Shushant/nepali-alpaca (HuggingFace)	~52k examples	Instruction following	Apache 2.0
Nepali Wikipedia	HuggingFace datasets: wikipedia (ne)	~50k articles	Pre-training / knowledge	CC-BY-SA
Nepali News Corpus	GitHub: sanjaalcorps/NepaliNLP	~100k articles	Language modelling	CC BY
FLORES-200 Nepali	Meta/flores-200 (HuggingFace)	~1k sentences	Translation benchmark	CC BY-SA 4.0
Nepali StorySet	GitHub: nepali-nlp	~10k stories	Text generation	MIT
Oscar Nepali	oscar-corpus/OSCAR-2301 (ne)	~200MB	Web crawl / pre-training	CC0
Custom crawl	Wikipedia, Kantipur, eKantipur, Nagarik	Variable	Domain-specific	Check individual TOS

Conclusion

The Path Forward for Nepali NLP

You now have a complete QLoRA fine-tuning pipeline for adapting LLaMA 3 to Nepali text. The techniques here — 4-bit quantization, LoRA adapters, SFTTrainer — represent the current state of the art for efficient LLM fine-tuning. With these tools, a single engineer with access to a rented GPU can produce models that would have required a team and significant compute budget just two years ago.

Nepali NLP is at an exciting inflection point. The language has enough online presence to gather meaningful training data, but is underrepresented enough that well-trained models create genuine value. Applications like Nepali language customer service bots, government document assistants, educational tools for students in rural Nepal, and Nepali content generation systems are all within reach.

Please open-source what you build. Push your datasets and models to HuggingFace Hub. Write about your process. The Nepali NLP community is small but growing, and every contribution — whether a 500-example dataset or a fine-tuned model — moves the field forward for everyone.

Fine-tuning LLaMA 3 on Nepali Text — A Complete Guide

When Should You Fine-tune vs RAG vs Prompt Engineering?

QLoRA: Efficient Fine-tuning with Low-Rank Adapters

LoRA Mathematics

Step 1: Install Dependencies

Step 2: Load Model in 4-bit Quantization

Step 3: Prepare the Nepali Dataset

Step 4: Configure LoRA

Step 5: Full Training with SFTTrainer

Training Hyperparameter Reference

Training Loss — Expected Progress

Step 6: Run Inference

Step 7: Push to HuggingFace Hub

Nepali Dataset Sources

The Path Forward for Nepali NLP

Shiv Shankar Sah

Stay Ahead in AI