You say the word for "mother" and your tutor winces — you just said "horse." Or worse, "scold." Same syllable, ma, four wildly different meanings, and the only thing that separates them is how your pitch moves. For English speakers this is the single hardest part of Mandarin, because English uses pitch for emotion, not for meaning.
Most tone guides hand you a Pinyin chart and an audio clip and leave it there. That works until you hit a syllable you have never heard pronounced. This guide bridges Pinyin to IPA — the same notation that makes reading dictionary pronunciations possible across any language — so you can read a tone off the page and know exactly what your pitch should do.
Why Mandarin Tones Change Word Meaning
In Mandarin, pitch is phonemic. That means a change in pitch contour changes the word, the way swapping /p/ for /b/ turns "pat" into "bat" in English. Pitch is not decoration on top of the word — it is part of the word.
The classic demonstration is the syllable ma. With four different pitch contours it becomes four unrelated words:
- mā (妈) — "mother"
- má (麻) — "hemp"
- mǎ (马) — "horse"
- mà (骂) — "scold"
This is why a sentence in Mandarin with the wrong tones is not just "accented." It can be unparseable, or it can mean something you never intended. Linguists measuring Standard Chinese have noted that vowels and tones carry roughly comparable information load — drop the tones and you lose about half the signal.
English does have pitch, but it operates at the sentence level (intonation), not the word level. We raise pitch at the end of a yes/no question and stress the word we want to emphasize. None of that changes the dictionary meaning of "cat." Mandarin reuses the pitch dimension English reserves for feelings and assigns it to the lexicon instead.
The Four Tones in Pinyin, Chao Numbers, and IPA
Linguists describe tone pitch with Chao tone numbers, a 1-to-5 scale (1 = lowest, 5 = highest) invented by Yuen Ren Chao. A two- or three-digit string traces the pitch contour over the syllable: 35 means "start at mid (3), rise to high (5)." The IPA borrowed Chao's iconic tone letters (˥ ˦ ˧ ˨ ˩), which draw the same contour as little staff lines.
Here are the four full tones, using ma as the example syllable:
| Tone | Pinyin mark | Chao number | IPA tone letters | Pitch movement | Example | |------|-------------|-------------|------------------|----------------|---------| | 1st | mā | 55 | ˥˥ | High, flat, held | 妈 "mother" | | 2nd | má | 35 | ˧˥ | Rising from mid to high | 麻 "hemp" | | 3rd | mǎ | 214 | ˨˩˦ | Dips low, then rises | 马 "horse" | | 4th | mà | 51 | ˥˩ | Sharp fall from high to low | 骂 "scold" |
A few things worth knowing about that table. The first tone is steady — think of holding a single high note, not letting it sag. The second tone is the "huh?" rise English speakers already make when surprised, which makes it deceptively easy in isolation and deceptively hard mid-sentence. The fourth tone is a firm, sharp drop, like an irritated "No!"
The third tone is the troublemaker. Its full 214 dipping shape (down then up) mostly appears when the syllable is said alone or at the end of a phrase. In running speech it usually flattens into a low tone around 21 — sometimes called the "half third tone" — without the final rise. Beginners who chase the full dip on every third tone end up sounding robotic.
The Neutral (Light) Tone
There is a fifth category that the four-tone framing hides: the neutral tone (轻声, qīngshēng), sometimes called the light or zeroth tone. It has no fixed contour of its own. Its pitch is short, weak, and determined by whatever tone came before it.
You see it on grammatical particles and the second half of some compounds. The question particle 吗 (ma) is neutral — and notice that this ma is not in the minimal set above precisely because it carries no tone. In Pinyin it is written with no tone mark at all: māma (妈妈, "mom") has a full first tone on the first syllable and a neutral tone on the second.
The neutral tone matters because over-pronouncing it — slapping a full tone on a particle that should be light — is a common giveaway of non-native speech.
Tone Sandhi: When Tones Change Each Other
Tones do not sit in isolation. When syllables collide, some tones shift. This is called tone sandhi, and three rules cover most of what a learner meets early.
Two third tones in a row. When a third-tone syllable is followed by another third tone, the first one is pronounced as a second (rising) tone. The textbook example is 你好 (nǐ hǎo, "hello"): written with two third tones, spoken as ní hǎo. The Pinyin spelling does not change — the rule lives in the pronunciation.
不 (bù, "not"). Normally a fourth tone. But before another fourth-tone syllable it becomes a second tone: 不是 is spoken bú shì ("is not"), not bù shì.
一 (yī, "one"). This one shifts in two directions. Said alone or at the end of a word it keeps its first tone (yī). Before a fourth tone it drops to a second tone: 一样 → yíyàng ("the same"). Before a first, second, or third tone it rises to a fourth tone: 一天 → yìtiān ("one day").
Sandhi is exactly the kind of thing Pinyin spelling hides and IPA can expose — the written tone mark and the spoken tone come apart. If you have read why spelling lies about pronunciation, this will feel familiar: the page and the mouth disagree, and you need a notation that records what the mouth actually does.
Pinyin Versus IPA: What Each One Is For
Pinyin is a romanization — an official spelling system for Mandarin. It is excellent for typing, for ordering syllables, and for marking tones with its four diacritics (ā á ǎ à). But Pinyin spellings hide real phonetic detail behind familiar Latin letters.
Take the letter i. In mǐ (米, "rice") it is a true high front vowel /i/. But in shí (十, "ten") the same i is a completely different "buzzing" vowel produced behind the retroflex consonant — nothing like English "ee." Pinyin writes them the same; IPA does not. Similarly, Pinyin zh, ch, sh are retroflex consonants /ʈʂ/, /ʈʂʰ/, /ʂ/, made with the tongue curled back, which the Latin digraphs do not telegraph to a newcomer.
So the two systems do different jobs:
- Pinyin tells you how to type the syllable and which of the four tone marks it carries.
- IPA tells you the actual articulation — the precise vowel, the retroflex consonant, and the pitch contour written as tone letters.
For tones specifically, the cleanest representation pairs the Pinyin mark with the Chao number or IPA tone letters, because there is broad agreement on the contours: 妈 mā /má/ or [ma˥], 马 mǎ /mà/ or [ma˨˩˦]. (Sources differ slightly on exact realized values — Beijing speakers often measure closer to 44, 24, 212, 52 — so when in doubt, trust the Pinyin tone category and the Chao contour over any single "official" IPA value.) This is the same Pinyin-plus-IPA pairing we use in our tour of IPA across 14 languages.
Why English Speakers' Habits Sabotage Their Tones
The problem is not that English speakers cannot produce these pitches. You already make every one of them. The problem is that your brain has spent decades wiring pitch to intonation — emotion, emphasis, sentence type — and that wiring fights you.
Three habits do the most damage:
Question-rise leakage. English raises pitch at the end of a question. When a learner asks something in Mandarin, that rise sneaks in and flattens or reverses the lexical tone on the final word. Research on first-year learners shows tone production stays shaky at the sentence level long after it looks solid on isolated syllables — precisely because intonation and lexical tone are competing for the same pitch channel.
Emphasis stress. English speakers stress the "important" word by raising and lengthening it. Do that to a Mandarin syllable and you can overwrite its tone — emphasizing a fourth-tone word by pushing the pitch up directly contradicts its falling contour.
Treating the third tone as a dip everywhere. Because the 214 shape is what tutorials drill in isolation, learners produce the full dip on every third tone, including the ones that should be a flat low tone in connected speech. The result sounds effortful and unnatural.
The fix is not more theory. It is hearing the contour and copying it, in context, enough times that the new pitch-to-meaning mapping overwrites the old one.
A Read, Listen, Save, Drill Workflow
Tones live in the gap between seeing a syllable and hearing it correctly. Closing that gap is a loop, and you can run it on real material instead of flashcard decks of isolated syllables.
- Read in context. Take an actual sentence — a subtitle, a news line, a song lyric. Real input beats syllable lists because it forces you to handle sandhi and the neutral tone.
- See the IPA. Select the word and look at its transcription with tone letters, so you know what pitch movement to aim for before you hear it. With IPAtics, you select any Mandarin text, press one hotkey, and see the IPA with tone marks plus per-phoneme tooltips for the retroflex consonants and tricky vowels. You can also transcribe Mandarin in your browser without installing anything.
- Listen and imitate. Hear it spoken, then copy the contour out loud. Exaggerate at first — overshoot the rise, overshoot the fall — then dial it back.
- Save and drill. Push the words you keep missing into a spaced-repetition deck. A deck that pairs the character, the Pinyin with tone mark, and the IPA contour turns tone into something you review, not something you guess. Our Anki pronunciation deck workflow walks through building exactly this.
The point of the loop is repetition with feedback: see the target contour, hear the target contour, produce it, check it, repeat. That is how the pitch-to-meaning rewiring actually happens.
Trying It Yourself
Tones are a perception-and-production loop, and the fastest way to close it is to see the pitch contour next to the word right when you are reading it — not after a dictionary detour.
IPAtics gives you instant IPA transcription with tone marks for Mandarin and 13 other languages, all from one hotkey. Or transcribe text in your browser right now without installing anything.
Frequently Asked Questions
How many tones does Mandarin have?
Mandarin Standard Chinese has four full tones plus a neutral (light) tone, giving five tonal categories. The four full tones are high-level (1st), rising (2nd), dipping/low (3rd), and falling (4th). The neutral tone has no fixed pitch and takes its short, weak pitch from the syllable before it.
What is tone sandhi in Mandarin?
Tone sandhi is when a tone changes because of the tone next to it. The main rule: when two third tones come in a row, the first is pronounced as a second (rising) tone, so 你好 nǐ hǎo is spoken ní hǎo. The words 不 (bù) and 一 (yī) also shift tone depending on what follows them.
What are the Chao tone numbers for Mandarin?
The four tones are usually written 55, 35, 214, and 51 on a 1-to-5 pitch scale where 1 is lowest and 5 is highest. Tone 1 is high and flat (55), tone 2 rises mid-to-high (35), tone 3 dips low then rises (214), and tone 4 falls sharply (51). These numbers were introduced by Yuen Ren Chao.
Are Mandarin tones hard for English speakers?
They are hard at first because English uses pitch for emotion and emphasis (intonation), not to change word meaning. English speakers can physically make every Mandarin pitch — the difficulty is rewiring pitch to carry lexical meaning, especially inside full sentences where question-rise and stress habits interfere.
Do tones actually change the meaning of words?
Yes. Tones are phonemic in Mandarin, meaning a pitch change creates a different word. The syllable ma is "mother" (mā), "hemp" (má), "horse" (mǎ), or "scold" (mà) depending only on its tone. Saying the wrong tone can produce a different word, not just an accent.
What is the neutral tone in Mandarin?
The neutral tone (轻声, qīngshēng) is a fifth, toneless category found on grammatical particles and some compound endings. It is short and weak, and its pitch is determined by the preceding syllable. In Pinyin it is written with no tone mark, as in the second syllable of 妈妈 (māma, "mom").
Why does the third tone sometimes sound different?
The full third tone (214) dips down and then rises, but that complete shape mostly appears when the syllable is alone or at the end of a phrase. In connected speech the third tone usually flattens into a low tone around 21 — the "half third tone" — with no final rise. Producing the full dip everywhere sounds unnatural.
Should I learn Mandarin tones with Pinyin or IPA?
Use both. Pinyin's tone marks (ā á ǎ à) are the everyday system for typing and reading. IPA tone letters and Chao numbers add precision about the actual pitch contour and reveal sounds Pinyin hides, such as the retroflex consonants and the buzzing i in shí. Pairing Pinyin with IPA gives you the most accurate picture of a syllable.
What is the difference between Pinyin and IPA for Mandarin?
Pinyin is the official romanization for typing and spelling Mandarin; it marks tones but uses familiar Latin letters that gloss over phonetic detail. IPA is a universal phonetic notation that records the exact articulation — precise vowels, retroflex consonants, and tone contours written as tone letters. Pinyin tells you what to type; IPA tells you what to do with your mouth.
Related reading: How to Read IPA: A Complete Beginner's Guide · IPA Across 14 Languages · Building an Anki Pronunciation Deck