Important Keywords

Token

A token is a piece of text — could be a word, part of a word, or even punctuation.

Example:
The sentence:

"ChatGPT is smart!"

Breaks down into tokens like:

"Chat", "G", "PT", " is", " smart", "!"

Each model uses its own tokenizer. GPT usually breaks words into sub-words.

Why it matters?

Models process tokens, not raw text.
More tokens = higher cost and slower response.
There are limits (e.g., GPT-4 can handle ~128k tokens max).

Prompt

A prompt is the input or question you give to the AI.

Example:

“Write a story about a robot who makes coffee.”

The AI takes your prompt and generates a response.

Completion / Output

The AI’s response to your prompt.

Example:
If prompt is:

“Tell me a joke”

The completion might be:

“Why did the computer go to therapy? Because it had too many bugs!”

Temperature

It is a float value (e.g., 0.0 to 2.0) that adjusts the probability distribution over possible next words when the model is generating text.

Simple Explanation: Think of temperature like a creativity knob:

Low temperature → the model plays it safe (predictable, accurate).
High temperature → the model becomes more creative (or chaotic!).

How It Works (in simple terms):

Language models generate the next word by picking from many possible words, each with a probability.
Temperature changes how sharp or flat that probability curve is.

Examples:

Let’s say the model is trying to generate the next word after:

"The cat sat on the"

1. temperature = 0.0 (deterministic)

Always picks the highest probability word.
👉 Output: "mat"

2. temperature = 0.7 (balanced)

A bit of randomness.
👉 Output: "mat", "sofa", or "floor"

3. temperature = 1.5 (high creativity)

Very random.
👉 Output: "rocket", "cloud", or "spoon"

Common Settings

Temperature	Behavior	Use Case
0.0	Deterministic	Facts, math, code generation
0.5	Balanced	General-purpose conversation
1.0	Creative	Storytelling, poem generation
>1.2	Very creative	Wild ideas, brainstorming

In Short:

Temperature controls how boring or bold your AI's response is.

Top-k Sampling

Top-k sampling is a method where the model:

Only considers the top k most likely next words, and randomly picks one from them based on their probabilities.

Why do we use it?

To control randomness and reduce weird outputs by not letting the model choose from all possible words (some of which have tiny, junky probabilities).

How It Works:

Imagine the model predicts the next word in a sentence, and it gives probabilities for 50,000 possible words.

Without Top-k: it can choose from all 50,000, even if some are very unlikely.
With Top-k = 5: it picks only from the top 5 most likely words, and samples randomly among those.

Example:

The model is generating the next word for:

"The pizza tastes"

Top predicted probabilities:

Word	Probability
delicious	0.45
great	0.20
amazing	0.15
awful	0.10
burnt	0.05
wooden	0.01
spicy	0.01
...	...

Top-k = 3 → consider only: delicious, great, amazing
Pick one of them randomly, weighted by their probabilities.
"awful" or "burnt" will not be considered.

When to use:

Top-k Value	Behavior
k = 1	Always picks the top choice (deterministic)
k = 10	Balanced randomness
k = 50+	More creative or surprising

Bonus: Often used with Temperature

First apply Top-k to get a shortlist.
Then apply Temperature to adjust randomness within that shortlist.

In Simple Words:

Top-k sampling = “Only pick from the top k best options”, then choose one based on probability.**

Max Tokens

Controls how long the output can be from a language model like GPT.

In Simple Terms:

“Max tokens” = the maximum number of words or pieces (tokens) the model is allowed to generate.

What is a Token?

A token is not exactly a word — it's a piece of text.

Text	Tokens
Hello	1
ChatGPT	1
unbelievable	2
I love pizza.	4
😊 (emoji)	1
2025-07-02	4

So max_tokens limits the number of tokens, not characters or full words.

How It Works

If you set max_tokens = 50, the model will stop generating after 50 tokens, even if it hasn’t finished its sentence.

This helps:

🚫 Avoid super long or endless outputs
💰 Control costs (API pricing is often token-based)
📦 Fit within token limits (e.g., 4096 or 8192 total)

Important:

The input + output tokens together must stay within the model’s total token limit:

Model	Token Limit
GPT-3.5	~4,096 tokens
GPT-4	~8,192 to 32,768

Example

Prompt:

"Write a short poem about cats."

And you set max_tokens = 20, the output might be:

Possible Output:

"Cats in sunbeams play,
Softly purring through the day..."

Then it stops even if the poem isn’t finished because it hit the 20-token limit.

Use Cases

Use Case	Recommended Max Tokens
Short answers (FAQs)	10–50
Chatbots	50–200
Story/essay generation	200–1000+
Code generation	Depends, usually 100–800

Stop Sequence

A stop sequence is a custom string or token that tells the language model:

"Stop generating text once you see this."

It’s like saying:

“As soon as you see this word/phrase, cut off the output!”

Why Use Stop Sequences?

To control where the output ends
To avoid unnecessary or repeated text
To simulate structured conversation (like ending after one message)

Example 1: Chatbot Message

You give this prompt:

User: What's your name?
AI:

And set stop = ["User:"]

The model might generate:

AI: I'm ChatGPT, your assistant.

It stops before printing "User:" again — avoiding generating the next turn in the conversation.

Example 2: Multi-Choice Question

Prompt:

Q: What is 2 + 2?
A:

Set stop = ["\n"]

A: 4

It stops as soon as it hits the first newline (\n) — short, sweet answer ✅

** Example 3: JSON Completion**

Prompt:

{
  "name": "Alice",
  "age":

Set stop = ["}"]

Output:

{
  "name": "Alice",
  "age": 30
}

It stops right before closing brace — useful for structured outputs.

Summary Table

Feature	What it Does
Stop Sequence	Halts generation when the model outputs a match
Type	String or list of strings (e.g., `["User:", "\n"]`)
Common Use	Chatbots, JSON, code, Q&A, structured text

“Stop Sequence tells the model: “Stop writing when you hit this word or phrase.”

Fine-tuning

Fine-tuning is the process of training a pre-trained language model on your own custom dataset, so it learns to give more specific, domain-relevant, or personalized responses.

** In Simple Words:**

You're teaching a smart AI a special skill or style, on top of what it already knows.

Analogy

Imagine GPT is like a chef who can cook all kinds of food.

With fine-tuning, you're teaching the chef to cook your grandma’s secret recipes perfectly.

Now, the chef (GPT) still knows everything but becomes super good at your specific style.

Why Fine-tune a Model?

To make it:

Talk in your brand voice
Answer in domain-specific knowledge (e.g., medicine, law, finance)
Follow specific response formats
Speak in a different language or tone
Act like a custom assistant or bot

How Fine-tuning Works (Step-by-Step)

Start with a base model (like GPT-3.5 or LLaMA)
Prepare a dataset of input-output pairs (called prompts and completions)
Train the model on this data using a few passes (called epochs)
The model updates its internal weights slightly to favor your examples

Example Dataset

{"prompt": "User: What's your return policy?\n", "completion": "Bot: You can return any item within 30 days with a receipt.\n"}
{"prompt": "User: How long does shipping take?\n", "completion": "Bot: Shipping usually takes 3–5 business days.\n"}

After fine-tuning, your model will always respond in this style and tone — even to similar but not identical questions.

Fine-tuning vs Prompt Engineering

Feature	Fine-tuning	Prompt Engineering
Changes model?	Yes updates internal weights	No just changes the prompt
Custom training?	Needs your dataset	Just uses clever wording
Cost?	Higher (training & hosting)	Lower (just inference)
Flexibility	More control over behavior	Limited, but easier

When to NOT Fine-tune

If you just want minor tweaks → use prompt engineering or function calling
If data is confidential → be careful about what you upload
If your use case is simple or short-lived

Important Keywords

On this page