I Know What I Hear. Or do I?

First and foremost, I hope you, everyone reading this, and your loved ones, are healthy, safe, and sound during this difficult time.

Some disclaimers. (1) This article contains opinions. My opinions. I do not expect anyone to share them, though some might. My only hope is that it initiates thoughtful (in both senses of the word) discussion. Feel free to disagree, but please don’t start an argument just for the sake of arguing because you are bored. (2) I am not an engineer. If I write something that is factually incorrect, please do not hesitate to correct me; that is how I learn.

Musical reproduction, especially in the stratified world of high-end audio, is about illusion. We audiophiles (if you are reading this, it seems a safe bet you are an audiophile) are willful participants in the deception, as it adds to our enjoyment of the music. For example, we want to be tricked into believing (or at least, imagining) that the musicians are in our listening room, or we are in the performance space (more on that below). We want to be fooled into believing (for example) that the vocalist or lead guitarist is stage center; that the drums, bass, piano, strings and horns are each off to a side; that the vocalist has the expected height, and so on, when of course the sound is coming from two (or more, in the case of multi-channel systems) speakers.

However, there are many times when the deception is less obvious, when our sensory systems “trick us” without our knowing it. To be clear, I am not suggesting it is a conspiracy, or some terrible evil being foisted on humankind. It is not. My intended audience of this article are those who are intellectually intrigued by the deus ex machina of audio reproduction; but more broadly, for any audiophiles who hope to sharpen their listening skills, and as a result, improve their playback systems.

MOST HIGHER-ORDER SENSORY PROCESSING IS SUBCONSCIOUS

It is human nature to believe our senses. There is likely an evolutionary reason for this: Early hominins would not have survived very long if they stopped to ponder “gee, I wonder if that lion I see (or hear, or smell) is real.” A far better strategy was to take perceived threats seriously, and then reflect on them at a later (and presumably, safer) time. Another problem facing our early ancestors (and for that matter, all animals) is the sheer volume of sensory information with which our nervous system must cope. (As an aside, this has reached staggering levels in recent times,

likely with detrimental effects to our psychological and physical health.) Our brains have developed complex mechanisms to separate the wheat from the chaff, by masking/filtering out that which is deemed irrelevant (or at least, less relevant). Importantly, and germane to audio, this is all done at the subconscious level. A well known and fascinating example is the “Cocktail Party Effect” which allows us to focus on one conversation, to the exclusion of often extensive “background noise.”

Given the extraordinarily mechanisms by which our brains process sensory information, I think it is reasonable, and hopefully instructive, to ask ourselves the following question: Are our senses to be trusted? A detailed analysis of this could occupy an entire graduate-level course, and is beyond the scope of this article (and quite frankly, beyond my own scant knowledge base). I will however endeavor to provide a few simple examples to show that in fact, our senses are often not as reliable as we would like to think.

VISION

The best known example of our senses deceiving us is the visual blindspot. The optic disc is a region of the retina (at the back of the eye) at which the optic nerve exits the eye on its way to the brain. The optic disc contains no photoreceptors (i.e., rods and cones), and thus does not respond to light. A property of our eye is that light from each portion of the visual world is focused on, and thus stimulates, one region of the retina. This one-to-one correspondence is conveyed back to the brain, and underlies our ability to visually localize objects. Because the optic disc lacks photoreceptors, we are unable to see that portion of the world, light from which is focused on the optic disc. We thus have a literal (as opposed to figurative) blind spot, which is quite easily demonstrated.

But remarkably, we have no conscious awareness whatsoever of the blind spot. How is this possible? In simple terms, our brain “fills in” the missing information, based on the immediate surrounding areas. (How it does this is anything but simple.) Most regard this as little more than a curiosity, with little regard for its larger (and profound) significance.

Optical illusions, of which there are many, are fun to look at. But as with the blind spot, they reveal how easily our visual system can be confused, and not trusted.

So called “magic” tricks are of course, illusions. While some rely on complex props with hidden doors, compartments and the like, most relay on sleight of hand, in which the illusionist tricks the observer’s visual system. Here is one of many examples showing “how it is done”: Is Seeing Believing

Far more serious are the numerous studies demonstrating the unreliability of witness testimony. Multiple witnesses to crimes will provide vastly different descriptions of the alleged perpetrator, the “get-away car,” etc. The visual system is not nearly as fool-proof as we like to think.

Also within the area of forensics, are studies showing how witness’ memories of an alleged crime can be altered by suggestion, often with devastating consequences. A well-known example is the day-care sex abuse scandal. Though this focused mostly on the testimony of children, adults are also susceptible to such manipulation. For discussions, see here and here. It is said that seeing is believing. More correctly, this should be “seeing is believing, but what we believe may be wrong.” Please take a moment to ponder that; I’ll wait. 🙂

HEARING

Let’s turn now to hearing, which is of course near and dear to audiophiles. Though I don’t know how this could be measured, my impression is that the auditory system is in some ways less trustworthy than the visual system, at least with the tasks that audiophiles ask of it. Part of the problem is the nature of sound itself, independent of our auditory system. For example, if we want to compare two colors, or two shapes, we can put them (or even pictures of them) next to one another. This cannot be done with sound. Moreover, and despite the protestations of many audiophiles, auditory memory is notoriously short. Making matters worse, even a small difference in volume can dramatically alter our perception. In comparing gear we should level-match, though I openly confess to not doing so.

One of the most interesting examples of the way the auditory system can be fooled is the McGurk Effect. We first hear a man saying “ba.” Then the picture changes and we hear him saying “fa.” But in fact, the sound is always ba; we “hear” fa because that is what his mouth and lips suggest he is saying. Remarkably, even when the secret is revealed, we still hear fa. But all one has to do is close one’s eyes or look away, and it immediately it returns to “ba.” The explanation for this phenomenon is that our visual system “over-rides” our auditory system. So do we really know what we hear? Certainly not in all cases. (As an aside, those who thinks their evaluations of audio equipment are not influenced by the visual appearance of auditory gear — let alone previously seen advertisements, reviews, etc. — are likely fooling themselves.)

And finally, let’s consider some aspects more directly related to audiophila. Things might get a bit dicey, so let’s first take a collective deep breath.

THE ANALOGUE SOUND

There are obviously many approaches to the design of gear used for audio reproduction. (I use the term “gear” to include both speakers and electronics.) Each approach has its proponents and opponents, but the two most contentious areas are tubes vs. solid state, and analogue (tape and LP) vs. digital. At this point, I will have to take some liberties and generalize. Those who favor analogue typically report that it sounds more natural than digital, with digital sounding — well, digital. Sound is an analogue phenomenon, and digital’s shortcomings are usually attributed to the fact that a digital signal cannot (at least currently) be perfectly reconstructed to a smooth analogue waveform. It is also claimed that digital filters (which are eschewed by certain designers and their devotees) produce high frequency noise. Last, certain digital filters produce pre-ringing, a phenomenon in which a transient is preceded by a series of small spikes. In nature we experience post-ringing — echoes of sorts — but to my knowledge, never pre-ringing. It thus seems quite reasonable that digital sounds less natural than analogue. But are these purported deficits of digital the only — or even the predominant — phenomena responsible for the alleged superiority of analogue?

I have wondered about this for a number of years, but my interest was especially piqued when audiophiles began digitizing their LPs — commonly referred to as “needle drops.” In discussing the needle-drop phenomenon with VPI founder Harry Weisfeld, he shared an interesting experience. He had purchased three different formats of an album — tape, LP, and 192 — all made from the same original Master. Harry digitized the two analogue copies, then had his listening group listen to all three digital files. Without exception, the group’s first preference was for the analogue tape, followed by the LP, with the digital coming in last. This, in and of itself, is not necessarily surprising, and does not seem to shed light directly on why analogue sounds superior. All analogies are flawed, and the one that follows is especially so, but hopefully it will suffice to make my point.

Let’s suppose we have three automobiles, and we want to rate their handling. (I am well aware that “handling” is not a single parameter, but let’s ignore that for now.) The Ferrari 488 ranks first, the BMW 330i ranks second, and the Honda Accord ranks third. I want to do something to negatively impact their performance, just as digitizing is claimed to negatively impact playback quality, so on all three cars I switch their rubber to run of the mill tires from Sears. As expected, all three cars handle worse than they did before the tire swap, but their relative handling performance is unchanged (i.e., Ferrari better than BMW better than Honda). From this we can conclude that the Sears tires are “a negative” that decrease handling performance, but also that there are positives properties of the cars themselves that cause one model to handle better than another.

But here is where it gets interesting. Many people — myself included — who have listened to needle drops, report that they retain some of the “organic” quality of analogue. If the superiority of vinyl was due entirely to the avoidance of the “negative” effects of the A-to-D and/or D-to-A process, then the needle drops should lack the “analogue quality” of the original LP. Though there are a number of potential explanations for this result (see below), I feel that the most parsimonious is that analogue is not just avoiding something bad (i.e., digitization), but is “adding” something positive, and that positive feature is retained after the A-to-D and D-to-A conversions. What could that “positive” effect be? I can’t say with certainty — and neither I suspect, can anyone else — but my gut and my ears suggest that “positive” feature is distortion. Pleasant distortion to be sure, but distortion nonetheless. What I find intriguing — and germane to this essay — is that (assuming it is a distortion) most of us do not hear it as a distortion per se, but instead perceive it as being more natural — more organic — more “life-like.”

(I should add that most everyone agrees that all turntables do not sound alike. I am not suggesting that to sound good, a turntable need merely add distortion. As should go without saying, there are many kinds of distortion, and some are clearly detrimental. The VPI HW40 (full disclosure: I am a VPI dealer) has superb speed control, is highly resistant to vibration, and has many other design elements I do not understand, all of which add up to extraordinary sound. But it certainly “sounds like” a turntable in the best sense of that term, which leads me to believe that the “analogue sound” is inherent in vinyl playback. I often wonder how much of that sound is introduced when the record is cut/pressed, and how much results from the cartridge following the grooves. I have not had the opportunity to hear an optical cartridge, which might shed some light (pun unintended) on how much of the analogue sound is due to mechanical cartridges.

When I shared an early draft of this article with Harry, he offered an alternative explanation for the perceived superiority of needle drops vs. CDs. If Harry is correct, it would negate my hypothesis; that is fine, as my goal is not be “to be right,” but rather, to foster discussion and hopefully increase our understanding and knowledge. Harry suggested that the ability of needle drops to retain the “analogue quality” of the original recordings, as compared to CDs which lack that quality, might be due to the fact that needle drops are made with a relatively simple A-to-D process, whereas digital recordings entail the use of the far more complex circuitry present in recording consoles. In other words, analogue is not adding an artifact, and digital is not inherently flawed; rather, it is the complexity of digital recording circuitry that is flawed, and those flaws in some way negate certain desirable properties of sound that are effectively reproduced through analogue gear. I wonder if this is something that could be tested, as it might have the potential to substantially improve digital recordings.

As they used to say in the old television commercials for Ginsu knives, “but wait, there’s more.” There is actually an “experiment” we could conduct that might provide the most direct answer to this question. Specifically, it would involve listening to a variety of LP that were originally recorded digitally. I suspect most listeners will find that they do not sound as good as LPs that were recorded purely analogue, but that is not the main issue. What is germane is whether or not they have the “analogue sound.” If they do, it would provide compelling evidence that LP playback is in fact adding something — a coloration — to the sound. But no matter the outcome, I want to make clear that I really like the sound of analogue. My VPI Avenger Reference turntable with Ortofon A95 cartridge, feeding the Merrill Audio Jens phonostage, is — to my ear — incredible. I do however want to know why. 🙂

Harry is not only one of the nicest people I have had the pleasure to know, he’s also one of the most open-minded. Though Harry prefers analogue to digital (duh…), he uses digital room correction in his home system, as well as some of the systems at the VPI House. That’s right; he plays back LPs from one of his world-class turntable rigs, or from his reel-to-reel player, and then digitizes it! As discussed above, it is unclear if the analogue playback added a distortion that persists after A-to-D conversion, digital processing, and the subsequent D-to-A conversion. However, the complexity of the digital room-correction circuitry (which, by the way, Harry does with a moderately priced Yamaha receiver) as compared to a simple A-to-D as in a needle drop, would seem to argue against his idea of the problem with CDs being the complexity of the digital circuitry used in recording, but for now it remains (to my mind) an unanswered question. I should add that as I’ve come to expect from Harry, he does not dismiss the notion that analogue might add distortion. Perhaps both of our ideas contribute to the sound. At this point the only thing I think we can safely conclude is that our auditory system cannot dissect exactly what is going on, which is of course the main thesis of this article!

THE TUBE SOUND

Let me switch gears now to the other “great debate”: tubes vs. solid state. Even the most devoted “digitophile” would likely acknowledge that early digital — for want of a better word — sucked. The sound was harsh, brittle, grainy, and fatiguing to the ears. In response, some companies began to offer modifications to CD players or stand-alone DACs. Many of these mods were based on the addition of one or more tubes (though in fairness, some mods sought to improve the power supply, amongst other things). I wish I had a dollar for every time I was told that such tube mods “fixed” the digital problems. I do not dispute that they often did improve the sound, and certainly respect those outcomes. But let’s be real; how can adding a tube “fix” an inherent problem in the digital circuitry? Clearly, it cannot. I would thus again posit that our auditory perception is murky, and often at odds with reality.

And yet another example, along the same line: Many audiophiles use solid state amplification for its high power, low output impedance, and overall stability. But many complain that the sound is “sterile” (whatever that means), and so they add a tube preamplifier. As with the digital mods, they describe this as “fixing” the problem with the solid state amp. Huh?? An upstream tube certainly cannot correct — or more generally, alter — a (supposed) flaw in solid state circuitry. I again do not dispute or begrudge that many prefer the outcome, but I think it worthwhile to consider what the tube preamp is actually doing. Keeping in mind that these are merely my opinions, I submit that the sonic changes result from one or more of the following: blunted transients, limited high frequency extension, higher noise floor which obscures detail, and a pleasant veneer of second-order harmonic distortion. That makes sense to me intellectually, and comports — at least in a general sense — with what I hear. Obviously, others hear differently, and their tastes and preferences are as valid as mine (or anyone else’s). But I do think it is yet another case in which we can reasonably ask “Do we really know what we are hearing?”

THE “HARMONIC ILLUSION”

In describing audio gear, audiophiles often use the term “harmonics,” almost always in a positive sense. For example, an amp or preamp might be described as proving richer, fuller, or more complete harmonics than another amp or preamp. Though I confess to having used the same or similar terminology back in my reviewer days, it now leaves me scratching my head. So that we’re all on the same page, let’s briefly review harmonics.

Sound, including music, almost always involves a fundamental frequency and harmonics, the latter being whole-number multiples of the fundamental. For example, if an instrument produces a fundamental frequency (also known as the first harmonic) of 50 Hz, then the second harmonic will be 50 x 2 = 100 Hz, the third harmonic will be 50 x 3 = 150 Hz, etc. It is of course apparent that the “same” note (eg., middle C, which is 261.63 Hz) played on different instruments will sound different — they will have different timbres. This is because each instrument differs in the number and relative amplitude of the harmonics. (Instruments also differ in their “timing,” which consist of attack, decay, sustain, and release.) When we hear a fundamental frequency and its harmonics, we (or at least, most of us) do not hear them as distinct sounds. We hear the fundamental tone, but it takes on a quality that is often referred to as warmth, fullness, richness, etc. (Quick but important aside: even-order harmonics are perceived as “warm,” whereas odd-order harmonics are perceived as “cold.”)

Each fundamental frequency is a sine wave. The horizontal axis (x axis, or abscissa) is usually time. In the case of acoustics, the vertical axis (y axis, or ordinate) is typically volume (as in loudness, not as in a quart or liter) or, if referring to an electronic device such as an amplifier, voltage. Thus, at each instant of time, the electronic signal will have a particular voltage. Note that the frequency cannot be determined by looking at a single point; rather, it is necessary to analyze a significant portion of the data plot. This is an important point that that we will return to shortly.

Importantly, as we add in the harmonics, the wave form takes on complex shapes: fundamental frequency with added harmonics.

The shape is determined by the number of harmonics, their relative amplitudes, and their phase in relation to the fundamental. Note that though it still looks “sinewave-ish,” it is irregular in terms of both the temporal pattern and the amplitude.

Looking at these complex curves, it becomes even more apparent that one would have to examine a significant portion of the curve to determine the frequencies that make up that complex wave. Clearly, examining a single point would tell us nothing about the underlying frequencies.

Engineers often use a square wave, which is the sum of the fundamental and the odd integer harmonics. I show it only to demonstrate that as a wave gets increasingly complex, it doesn’t look anything like the sinewaves from which it is constructed.

Now try and imagine the complexity of the signal from a symphonic orchestra. The sound results from typically 50-100 instruments, including flutes, piccolos, oboes, clarinets, bassoons, contrabassoons, horns, trumpets, trombones (of different sizes), violins, violas, cellos, double bass, various percussions, and often keyboard(s). Each type of instrument has its own spectrum of harmonics, and all the instruments are not in perfect phase. Not surprisingly, the combined output looks nothing like a sine wave; you can see portions of the combined wave here and here.

In the early 1800’s, Joseph Fourier showed that a complex wave (“function”) could be described as the sum of an infinite number of harmonics; this is commonly known as a Fourier Transform. (In practical terms, it does not have to be infinite; but the greater the number of harmonics, the closer the sum approaches the actual complex wave.) In simple language, no matter how complex a waveform, it can be “deconstructed” by adding up multiple sine waves that differ in frequency, amplitude, and phase.

Remarkably, our auditory system is able to “make sense” of incredibly complex waveforms. But now let’s consider audio gear, such as amps and speakers. (I am NOT referring to audio analyzers, which are designed to determine the frequencies in a complex waveform by Fourier analysis.) Before I present may position, I must state clearly that I am about to delve into an area in which I have only the most rudimentary knowledge. I will try to not make a fool of myself but as I stated earlier, if I make some glaring errors, please correct me.

When a reviewer states how well a particular amplifier reproduces “the harmonics,” the implication is that other gear does not do so as well. (For the sake of simplicity, in the ensuing discussion I will just say “amplifier,” but it applies equally to preamplifiers, speakers, cartridges, turntables, etc.) But this is where my head scratching begins. To see why, let’s go over some amplifier basics. While an amplifier does effectively “amplify” an audio signal, we should not lose sight of the fact that electricity cannot be amplified per se. (First Law of Thermodynamics: energy can neither be created nor destroyed; it can only transferred or changed from one form to another.) The most common types of amplifiers used in high-end audio are Class A, Class A/B, and Class D. Though they differ in fundamental ways (especially Class D vs. the other two classes), what they have in common is that they use the audio signal to regulate the power being delivered from the power supply. (For class A or A/B amps, the audio signal determines the “extent” to which the device (tube or transistor) opens; for Class D, it determines how long the transistor remains open.) But the key point is this: The only information available to the amplifier is the instantaneous voltage at each moment in time. That’s it. Period. Full stop. Please go back to the earlier links to the complex waveforms. Do you “see” the underlying sine waves? Neither do I and importantly, neither does the amplifier. Each instantaneous voltage results from the combined voltages of the underlying fundamental and harmonics at that particular instant, but there are an infinite number of combinations (or even a single sinewave) that could yield an identical instantaneous voltage. A Fourier Transform cannot be performed on an instantaneous data point; it requires an analysis of entire waveform over a length of time. Amplifiers do NOT do this. So what I’m getting at is, amplifiers cannot distinguish a fundamental frequency from its harmonics. All the amp can do is take the audio signal at each instant, and amplify it.

Despite this, audiophiles persist in referring to the “harmonic content” of certain gear. Amps certainly do sound different from one another (if you disagree, you’ve been reading the wrong article), and this might result in a difference in their reproduction of harmonics. In particular, an amp that is bandwidth-limited would not be able to reproduce the highest harmonics, but this would likely mostly effect only the higher frequencies, which we perceive as “air.” Some amps have high noise floors, and/or higher distortion, and/or non-linear frequency response; in my opinion, it is these parameters that result in the perception of altered harmonics, when that is not really the problem.

That was a quite lengthy way to support my belief that when audiophiles describe the “harmonic texture” (or some similar terminology), they are actually describing something entirely different. In other words, this is yet another case of “not knowing what they heard.”

And since I’ve already put my head on the chopping block, I’ll come right out and say it: I believe (but cannot prove) that the harmonic “richness” of tube amps, in particular certain types of tube amps, is due to the addition of harmonics (in particular, second-order harmonics) that were not present in the recording. These added harmonics are certainly pleasant, but they are distortion just the same. And before my inbox starts exploding with nasty E-mails, I reiterate that these are just my opinions, based on my own personal tastes.

“YOU ARE THERE” VS “THEY ARE THERE”

Audio reviewers and enthusiasts often make a distinction between “you are there” and “they are here.” If a system accurately reproduces subtle details, one should hear the ambiance of the recording venue (which may of course be a recording booth). I certainly agree that a less revealing system will provide less detail, and thus less realism. What I haven’t experienced are systems that are more “you are there” as compared to other systems that are more “they are here,” though many audiophiles describe this. However, what I certainly have experienced is systems (in particular, speakers) that have a more forward presentation, while others have a more recessed presentation. Could this be to what they are actually referring? Such differences in the “forwardness” of the presentation is, most likely, due to an emphasis or de-emphasis of certain frequencies. However, whatever the cause of the “they are here/you are there,” or forward/not so forward presentation, we are not consciously aware of its cause, thus this is yet another example of not being able to trust out ears. (As an aside, I suspect that recording engineers are more astute at detecting frequency non-linearities. Of course, they don’t seem to hear things that many of us do — such as audible differences in power cables — but that is a story for another day.)

Similarly, as is well known to those who have experimented with Eq, a small (as little as 1 dB) boost in the presence region will give the impression of greater detail. In my own experience this is something I don’t recognize immediately, but it becomes more apparent with extended listening. I’ve often said that the ears are good detectors, but terrible measurers.

FINAL THOUGHTS

The study of human perception encompasses many disciplines, including (but not limited to) neuroanatomy, neurophysiology, cognitive science, and philosophy (qualia, such as “redness”).

By its very nature, audiophilia is inextricably linked to sensory perception. As I’ve tried to show in this brief essay, our perceptions are notoriously unreliable. Audiophilia is also inseparable from illusion; some of those illusions are intended (soundstage, for example), while others are not. It is my belief that a better understanding of human perception, and of misperception, will lead us to better equipment, better reproduction of recorded music, and thus a better emotional connection to music.

Laurence A. Borden, President, Distinctive Stereo LLC

info@distinctivestereo.com