Deep Dive on Qualcomm Snapdragon DXP

• The Hexagon DSP gets wider, faster, and more powerful
• Smartphone cameras will be faster, with more image processing capabilities
• Qualcomm moves sensor hub inside for always on features

A long-term leader in building truly heterogeneous SoCs, which include a 64-bit CPU, a powerful GPU, dual ISPs, Qualcomm also includes a few DSPs, with brands such as Hexagon. Other SoC builders have DSPs in their chips but use them primarily for audio or modem functions. Qualcomm does that as well and also dedicates one to video and image processing.

DSPs use a wide word and are often referred to as very long instruction word (VLIW) devices. Some folks say the “L” really is “large.” VLIW devices can be run as a parallel SIMD processor and have been used as floating point processors in graphics machines. With the power of a VLIW device also comes a complex programming environment, although TI and others have developed some very clever compilers to take some of the drudgery out of explicitly programming the processor. Qualcomm also has such tools for their OEM customers.

What’s new in the Snapdragon 820 is the extended Hexagon DSP, which Qualcomm has designated the 680. The company is employing the signal processing capabilities of the 680 with the two Spectra ISPs in the Snapdragon, which makes for a powerful and very fast image processing system—things that used to fill a 2U rack 10 years ago now fit in your pocket.

Referencing them as the Hexagon Vector eXtension (HVX), they expanded the SIMD from 64-bits to a 1024b SIMD 4 vector-slot VLIW with 4096 result bits/cycle. You can manipulate the device (via its VLIW I/O) to run 256, 8×8 multiply, or 64, 16×16 multiply operations. It also has 32, 1024-bit vector registers which can be operated as 8/16/32 bit fixed point function. The wide vectors processing only support fixed-point operations. This is because of the focus on imaging processing by the wide vector capability, with most imaging being 16-bit fixed-point or less. Only the Hexagon Vector eXtensions are fixed-point only, not the entire core.

Because a DSP is such a special device, it can accomplish a lot of the same accuracy, and do it much faster, than a conventional CISC FP processor. And it can do it with less power. In imaging, for example, the DSP can generate results ~3x faster at ~10x lower energy (vs. quad-CPU). The Hexagon 680 does support 32-bit floating point on the scalar portion of the core, which also still performs SIMD 64-bit concurrently with the SIMD 1024-bit.

It is important to remember this wide vector capability is an extension of the core. That is, the core retains all the previous capabilities, concurrently supporting SIMD 64-bit for 8/16/32 bit fixed-point as well as 32-bit floating point. The DSP’s special ISA offers sliding window filters, LUTs, histograms, and performance sufficient for UHD video, or post-processing of 20Mpix camera burst mode processing and more.

Schematically, the 680 looks like most DSPs, as shown in the following diagram. However, the supporting hardware multi-threading is unique to Qualcomm’s Hexagon DSP.

Screen Shot 2015-08-27 at 8.46.22 PM

The 680 can run four 4 parallel scalar threads each with 4-way VLIW and shared L1/L2 and do it at 500MHz per thread, yielding 2GHz total scalar performance. That translates to super-scalar performance concurrently with new wide vector performance.

There are two Hexagon Vector eXtensions (HVX) contexts, controllable by any two scalar threads that will run up to 500MHz per thread, giving 1GHz total vector performance. Other threads can do scalar work in parallel. The net result is a domain specific architecture with a wide 1024-bit SIMD (for pixel data parallelism) and emphasis on low precision fixed-point plus a special ISA. All of that provides a parallel and coordinated capability for scalar and vector threads, which can exploit the large primary cache for imaging working sets.

The 680 takes data from the SoC’s ISP via a L2 cache, and returns image processing and filtering results to the system memory and CPU, as illustrated in the following block diagram. The primary pre-processing path ingests from the Camera Sensor and returns to the Camera ISP, as highlighted in the yellow block. In addition, any supplemental information obtained from the pre-processing can be stored in memory for later use.

Screen Shot 2015-08-27 at 8.47.25 PM

The SoC’s ARM compliant SMMU allows for Zero-Copy data sharing with CPU. That, in turn, provides a multi-threaded DSP capability that can service multiple offload sessions (concurrent apps for Audio, Camera, Computer Vision, etc.). The SMMU supports multiple Context Banks to allow sharing with multiple different address spaces on the CPU and the SMMU can be used to support processing on secure content managed outside of HLOS.

So what?

So with a wide-word signal processor in the front end, image things can be run really fast. With ever increasing sensor resolution and higher resolution screens, you need to move pixels from the front end to the screen fast, and they better look right when they get there.

Qualcomm compared the DSP with HVX vs. just a Quad Krait CPU with full Neon-Optimization. The Quad Krait CPU was run at 2.65GHz, and the single DSP/HVX ran at 725MHz. The results are shown in the next diagram.

Screen Shot 2015-08-27 at 8.48.32 PM

The point Qualcomm is trying to prove here is that, for the super smartphones that will be coming out in 2016, with mega sensors and big high-res screens, you need more image processing horsepower than you can get from just a CPU, no matter how many cores you jam into that CPU.

Qualcomm actually puts three DSPs in their SOC.

Screen Shot 2015-08-27 at 8.49.55 PM

The low-power island DSP is for “always on” sensor processing. This is a major breakthrough for Qualcomm and the industry. Putting the sensor hub inside the SoC saves board space and, most importantly, power. The chip has a new power management schema to be “always off” until needed. That gives a longer battery life for key use cases (e.g., pedometer or sensor-assisted positioning).

Qualcomm is claiming to be the first in the SoC market with super wide vector SIMD extensions for their DSP. They claim it can be exploited through conventional tools and techniques, using shared memory POSIX-like threads (on DSP RTOS), and a LLVM compiler. This, says the company, allows programming with C/C++ and Intrinsics and a suite of pre-optimized libraries for common filters & algorithms. What’s not to like?

Taming the Energy of Gaming Computers

Gaming PCs consumed roughly 75 billion kilowatt-hours per year of electricity

By Dr. Jon Peddie, President, Jon Peddie Research, publishers of “Tech Watch” and “Graphics Speak” with Nathaniel Mills of “Greening The Beast” and Evan Mills of Lawrence Berkeley National Laboratory

In the journal “Energy Efficiency”, Evan Mills of the Lawrence Berkeley National Laboratory has published a study that presents a novel analysis of the energy use of gaming PCs. We were able to assist him by providing marketing data and some background information on the gaming markets.

About one billion people around the world today engage in digital gaming. Gaming is the most energy-intensive use of desktop computers and the high-performance “racecar” machines built expressly for this purpose comprise the fastest growing type of gaming platform.

Mills found enormous performance-normalized variations in power ratings among the gaming computer components available in today’s market. For example, central processing units vary by 4.3-fold, graphics processing units 5.8-fold, power supply units 1.3-fold, motherboards 5.0-fold, RAM 139.2-fold, and displays 11.5-fold. Similarly performing complete systems with low, typical, and high efficiencies correspond to approximately 900, 600, and 300 watts of nameplate power, respectively.

While measured power requirements are considerably lower than nameplate for most components we tested–by about 50% for complete systems–the bottom-line energy use is massive compared to standard personal computers.

Based on our actual measurements of gaming PCs with progressively more efficient component configurations, together with market data on typical patterns of use, Mills estimates the typical gaming PC (including display) uses about 1400 kilowatt-hours of electricity per year. The energy use of a single typical gaming PC is equivalent to the energy use of 10 game consoles, 6 conventional desktop computers, or 3 refrigerators. Depending on local energy prices, it can cost many hundreds of dollars per year to run a gaming PC.

While gaming PCs represent only 2.5% of the global installed personal computing equipment base, our initial scoping estimate suggests gaming PCs consumed roughly 75 billion kilowatt-hours per year of electricity globally in 2012, or approximately 20% of all personal desktop computer, notebook, and console energy usage combined. For context, this corresponds to about $10 billion per year in energy expenditures or the equivalent electrical output of 25 typical electric power plants.

Given market trends and projected changes in the installed base, Mills estimates this energy consumption will more than double by the year 2020 if the current rate of equipment sales is unabated and efficiencies are not improved. Although they will represent only 10% of the installed base of all types of gaming platforms globally in 2020, relatively high unit energy consumption and high hours of use will result in gaming computers being responsible for 40% of overall gaming energy use.

This significant energy footprint can be reduced by more than 75% with premium efficiency components and operations, while improving reliability and performance. This corresponds to a potential savings of approximately 120 billion kilowatt-hours or $18 Billion per year globally by 2020.

There is a significant lack of current policies to achieve such improvements and very little guidance is available to help consumers make energy efficient choices when they purchase, upgrade, and operate their gaming PCs. Key opportunities include product labeling, utility rebates, and minimum efficiency standards.

You can download the report here.

Imagining the Personal Companion of the Near Future

The fast developing potential for truly useful applications that can convert our personal companions, our smartphones, from basic input/output devices into truly thinking machines is dazzling, science fiction-like, and almost staggering in their scope and range of capability. With well over a billion devices deployed, containing a bewildering array of sensors and processing power to design for, thousands of new, useful, and inexpensive applications are being offered daily to exploit these marvelous devices. But most, unfortunately, are still living in the age of Pong — applications designed to mindlessly entertain a user while they are actively interacting with their phone.

A simple litany of these applications would quickly become tedious and boring; and no matter how long the list, it would not reveal the growing influence and power that comes with cognitiveness and connectedness. We’re living in an era of cognitive computing that has the potential to deliver an entirely new type of user experience, one that you don’t need to engage in actively. Technology works best when it’s invisible. For instance, we no longer think about the transmission in our cars, its gears smoothly and unobtrusively give us the right amount of power and, at the most economical level, it just works, we take it for granted, and it makes our lives better.

It’s a few years from now and I have an appointment. My personal companion is talking to my car before I even get into it. My companion really cares about me and wants to make sure the car knows where I’m going and that I am alert while I drive. My car is not totally autonomous yet and it’s best if I’m awake and alert when driving. My personal companion will inform the car I went to bed at midnight, have been up since 5:00am, have only had one cup of coffee, spent two hours checking email, and I’m now slumping in my seat with a glazed look on my face. The worst thing in the world for me at this point is to let the interior temperature get too comfortable or to play soothing music. I’ll be sound asleep in 20 minutes and I have an hour and a half drive ahead of me in mixed to heavy traffic, with a strong possibility of a light tapping of raindrops to lull me to sleep.

My personal companion senses this and goes on high alert. Because my device is cognitively proactive, it is already instructing the car’s driver monitoring cameras and steering wheel vibrators to be alert, while at the same time listening to the radio and changing stations at the end of a news story or song. I could have gone a step further and authorized the phone to prevent me from turning off the radio which would instruct the car to limit my speed to the speed limit—that would really annoy me, so I didn’t opt for that cute little feature.
And then there was the little talk we had on the way to the car. My personal companion, having all the data on my morning activities so far, knowing where I’m going, and having to get back from, said, “Jon, do you really have to make this trip? Can you reschedule it? You have an opening tomorrow and Friday.”

As I get out of the car, the glasses (which are paired to my personal companion) have visited the social media sites and shown me the bio and a photo of the person I’m going to meet. It found a lot about her company and her current projects, and the last three she did. I blink and the string of emails that got me to make this trip show up, thankful I never had to take my hands out of my pockets this chilly, rainy day.

Entering the lobby I take off the glasses and approach the counter, while bringing out my phone to show the attendant who it is I am here to see. At the same time, it sends my information to his terminal. He hands me my badge and my smart companion quickly informs me the badge doesn’t allow me to go beyond the lobby – this is going to be a very short meeting. I quickly review the email thread again, and my companion highlights the unresolved issues. It knew not to do it while driving here because it would have taken my attention away. I suppose that’s another reason to buy one of those autonomous cars that will be available next year.

It’s been a long day, so I go to the local pub to meet a friend for a snack and a few drinks. My watch is informing my personal companion my heart rate is a little elevated from the drive home in rush hour traffic and maybe that extra cup of coffee with the client. “Thanks for the update,” I say snidely. My phone’s not judgmental about my tone but I know at times it can and should be. I like that.

Dinner and conversation at the pub is pleasant and the hours slip by when, suddenly, my phone starts saying “Jon” progressively louder. Pretty soon everyone in the pub can hear my little loud-mouthed companion. I try ignoring it but it won’t stop so I look at it and it says, “Please take a breathalyzer test now if you would like to drive your car home today.” It seems my tattletale smart watch has been keeping track of exactly how many times I’ve lifted something to my face. Even more intelligently, based on my heart rate and respiration (again detected by the tattletale smart watch), my personal companion suggests I start drinking water.

Looking around the room, I notice a young woman discretely dip an iPod sized device in her drink and then a few minutes later excuse herself to go to the toilet. She’s sampled her drink and is going to analyze it. What does that say about what she thinks of her date? Why doesn’t she just leave? Then I remember, people can also check for sugar content with those samplers and she might be diabetic.

I hesitate. I don’t want to fail my own test, so maybe a little water is a good idea. I’d order coffee but even the meditation app in my phone can’t overcome late night caffeine ingestion. My personal companion is on the job already and suggests a little walk would be a good idea. I say good night to my friend and march outside to pace the parking lot and think about the day. I start wondering if I could probably get rid of my doctor, my lawyer, and my mechanic, and have my phone do everything. Maybe I have had too much to drink.

With the new low power consumption processors, my companion is always listening and monitoring. It knows if I’m sitting, walking, running, sleeping, in the office, my car, or a restaurant. Knowing that, it makes recommendations and is able to take independent actions.

That’s all done in the phone along with the security ID stuff that checks my eyes, face, and heartbeat via my watch and, if necessary, my companion will ask to check my fingerprint. I haven’t had to have my fingerprint checked for some time due to the behavior sensing my companion does. Talk about being paired, this thing really knows me.

As I approach home, the lights in the bedroom and hallway come on and the garage door opens as I enter the driveway — no need to push buttons like in the old days. In bed, reading a book on my phone, the lights in the room start to dim. It was 11:00pm and my companion lets me know it’s time to try to get some sleep. I put the phone down and let its induction charger work while I snooze. It will be a comforting sleep, as relaxation sounds play. I don’t have to worry about the alarm, based on my calendar; my personal companion will wake me at the appropriate time.

The lights come on the next morning and I can smell the coffee. My companion has turned on the machine and signaled my PC to boot up and collect email. These were not timer settings; these were intelligent functions based on what I had to do today including what time I got up. I slept in this morning till 6:15am.

The day went pretty fast, no major problems. The call with the client as suggested by my personal companion really helped. She had come up with a way to get around the major obstacle and, as luck would have it, could use my help with the new approach.

By 6pm, I was ready to knock off and take a little diversion. I’d been working on an idea to catch rainwater, taking the little bit we get from the roof and route it to a little pond and then draw on that water to feed the plants. The grass was long gone, replaced with succulents we maintained to support the birds, bees, and tiny critters that clean up the place at night. I had the idea but it was hard getting my mental visualization across to the roofing people and I couldn’t express it properly on paper. I took the phone out and pointed at the sloping, flat-topped roof and walked the perimeter of the house. My personal companion faithfully captured every detail with its built in 3D camera, measuring every inch and all the obstacles like doorways, trees, vertical support, nearby fences, building an accurate 3D model as we went.

Back in my office, I tossed the phone on the desk and it charges itself. As I opened up an app on my computer, the phone automatically connected itself and I was pulling the 3D model out of it in minutes. Now, with a clever app I was able to add my design. The app was smart enough that I was cautioned not to make the cantilevered gutter too wide. After about two hours of trial and error, all done with a mouse and some verbal commands, it was finished and perfectly accurate. I hit the print button and a PDF file was created, complete with dimensions and images of the house. It was good enough for framing, or so I thought. I emailed it to the roofer and went on to dinner.

Later I looked at my work of art on my phone. The six inch 4K screen was fantastic and the eye tracking cameras adjusted the image’s perspective as I looked at it, checking different angular views. It was like 3D but without the glasses. Putting the phone in its docking station, it connects wirelessly and automatically to my 4K TV, which I also use for work.

I wasn’t tired and didn’t want to watch TV, so I decided to finish my review of a new AR e-book that had been sent to me. I used to think these were kid’s books but this one was about WWII and tapping on a paragraph would give me a choice of a 3D map or Google street views. With the eye tracker I could look around in the scene. It really made a difference. The book was well written in addition to having a rich database of images behind it.

The night lights came on and my personal companion started playing soft music, a cue to wind things down. I had a big day tomorrow, a new game to test that was more likely to test me.

By the time I got to the office, the mail (yes we still got regular post office mail) was on my desk. I still looked forward to the mail. It was always a pleasant surprise to look at postage stamps and open an envelope. One of the envelopes was in Chinese, addressed to me in English. Opening it revealed an official looking letter, but in Chinese except for the date and the line with a dollar sign and a nice number in it. I had bid on a project with a major Chinese computer company and I was hoping this was the approval. I picked up my phone and scanned the document. It was immediately translated, storing the results as I read so I could print it out later. This was a happy day. I did get the contact, now I had to do the job. I looked over the document to see if they had accepted my delivery date. They hadn’t and moved it up two weeks. That was annoying and, as I was working up my anger, my phone grabbed by attention. The screen had turned green and a message in a light yellow italic font floated across the screen: “Relax, you don’t have to accept it, make a counter offer. Take a breath. Stand up. Walk around; your heart rate is elevated.” “Yes dear,” I say to my personal companion, and take the advice.

Later in the afternoon, after I had thought about it, I came up with what I felt was a reasonable compromise. I started dictating to my phone, having told it I wanted the file to be in English and Chinese. After a few minor corrections, I said, “sign it, send it, and file it.” Done, done, and done.

It was the end of the day and time for a little relaxation. I slid my phone into a housing and then put on the HMD. I was going in, into a 3D world and kick some dragon butt. The HMD used all, and I mean ALL, the sensors in the phone, to locate me, track my eyes, track my head, and even watch where my hands were. The wide screen phone gave me pretty good peripheral vision and its super high res 4k screen with 120 Hz refresh rate and full UHD color gamut made the world look as realistic as possible. Certainly good enough to get me, Mr. Pixel, to suspend disbelief. OK, here. dragon dragon, come on out, dragon.

Nvidia’s new Android Gaming Console

Nvidia has been steadily investing in, and expanding the capabilities of, their Shield product line combined with their Grid server system. The original product, the Shield Portable, a game controller with an attached screen, was a kind of “toe in the water” for Nvidia to enter into the consumer electronics business. The next product was the Shield Tablet, which also was a consumer electronics product, but Nvidia shared it with their partners (e.g., EVGA).

This week at GDC in San Francisco, Nvidia unveiled its latest Shield product, the Shield Console, an Android TV-like device with a game controller and an optional remote control stick (similar to the Apple TV controller, only black instead of silver).

Nvidia has also set up an Nvidia game store with over 50 games, and users will be able to sign up for a subscription (price TBD, but obviously it will be in line with Amazon and others). For an easy comparison, you could say Nvidia’s Shield console was like Amazon’s Fire TV.

However, Nvidia has been nibbling at milliseconds since they brought out the first Shield Portable, and the differentiation between Nvidia’s Shield console and all other Android TV/consoles will be response time, latency, and throughput—Nvidia’s will be faster. One of the reasons is because they control both ends of the pipe—the server and the client.

The company has been and continues to also work very closely with game developers and is re-porting lots of favorites as well as new games. These games will run at an “honest” 4K and even better at HD.

The cabinet is lovely to look at with sharp diagonal lines, and multi-reflective surfaces, and is about the size of a thin book, 8 x 5 inches, and 1 inch thick.

Screen Shot 2015-03-05 at 8.05.30 AM

The system, which sells for $199, comes with the Shield processor cabinet, a stand, a game controller, and a power supply. The optional TV controller is $30. The stand, by the way, has an amazingly sticky nano surface on the bottom and, once you put it on a flat surface like a table, it’s hard as hell to pull off. The cabinet slips neatly and firmly into the stand.

In the demo I had at Nvidia’s facilities in Santa Clara, the system was very responsive. I even asked where the server was, assuming it had to be in the room. I was told it was in Seattle. That was a surprise because it was so fast – there was no noticeable latency, no sound mis-sync, and it was driving a full 4K screen wirelessly (the system uses 802.11 a/c).

What do we think?

This product marks a major step for Nvidia, one it has been building toward for decades — to be a full service consumer electronics company. Nvidia has been saying for a few years, “We are not a semiconductor company.” Their Grid, Telsa, Quadro, and other products are ample proof of that. Yes, they do sell semiconductors and they also sell components (like the recently announced automotive subsystems). But they really want to be like Apple. And with the Shield console and its ecosystem, they are one step closer to doing that. The main difference between Apple and Nvidia (other than size) is Nvidia uses a common OS. Nvidia sees it as an advantage in that it can leverage all the app development work, the APIs, and the OS itself, and not have to carry the expense of doing all the work.

The new console is truly delightful, and I’m looking forward to getting one.

Qualcomm Snapdragon 810 Shows Its Stuff

In developing the Snapdragon 810 application processor (AP), Qualcomm incorporated several new features and capabilities, a tricky move for such a complex device and one loaded with potential problems. Some thought they couldn’t do it.

Nevertheless, not only is the 810 on track as promised for mid-2015 device deliveries, it has several “firsts” associated with the new chip:

• It is Qualcomm’s first 20nm SoC (being fabed at TSMC)
• It’s the first hexa core, 64 bit ARMv8 CPUs (four A57 @ ~2GHz, and four A53 @ 1.55GHz) for Qualcomm
• It’s the first SoC with dual 14-bit Image Signal Processors (ISPs)
• The 810 introduces the new Adreno 430 GPU
• It has the first dual channel 1600 MHz LLPDDR4 memory implementation in the industry
• It’s the first hardware implementation of 4K (3840 x 2160) HEVC/H.265 video encode
• It’s the first implementation of UFS 2.0 storage support
• It’s Qualcomm’s first WCD9330 analog CODEC
• It’s the industry’s first multi-channel 4G LTE category 9 Carrier Aggregation Connectivity

Although it is Qualcomm’s (QTI) first 64-bit hexa-core CPU, it uses the ARMv8-Cortex A57/A53 core. In the past, QTI has designed its own ARM-ISA CPU, the Krait. The Cortex big.LITTLE CPU however, has been available for licensing for a while and used by several other SoC suppliers, therefore it does not represent much risk for Qualcomm.

However, a report in Business Korea last week said the chip “overheats when it reaches a specific voltage,” and “slows down owing to problems with the RAM controller connected to the AP” (the story was echoed by a European web site and, like so many rumors, took on a life of its own, much to the delight of short sellers). However, it’s just not true.

The 810, which I got to play with last night in various form factors (tablets and phones), has the industry’s first LPDDR4 memory and controller, and is running at 1600 MHz (yes, in a mobile device). At that clock speed, it’s impressive because it is delivering a total bandwidth of 25.6GB/s (it is 2×32-bit channels—64 bits). That kind of memory performance is what’s needed to deliver high performance video processing, fast Open CL performance, and high level graphics performance. Obviously something so critical to the success of the chip wouldn’t be left to chance.

QTI introduced the Adreno 430 GPU in the 810 and it offers support for OpenGL ES 3.1, hardware tessellation, geometry shaders and programmable blending. It has frame buffer compression and can drive an external 4K display at 30 fps or 1080p video at 120 fps via HDMI 1.4. The company claims the 430 deliver up 30% faster graphics performance and 100% faster GP-compute performance over the predecessor Adreno GPU (420), while reducing power consumption by up to 20%. Qualcomm has also incorporated a new level of GPU security for composition and management of premium video and other multimedia. These claims are not surprising. QTI has been the leading supplier of GPUs for several years, offering great performance while using little power, and selling more GPUs than any other company.

The Adreno 430 will be featured in top commercially available games like Activision’s popular “Skylanders Trap Team”, running on a 4K display. At the Qualcomm user experience media event in New York, Qualcomm showed the Epic Unreal 4 game engine running high end graphics content on Snapdragon 810’s 64bit ARMv8 CPUs.

QTI has always been good at video; however, the 810 has an upgraded camera suite with gyro-stabilization and 3D noise reduction. In the previous AP (the 805), you could capture 12-bit 4k video, but not directly drive a 4k screen. In the 810, there are dual 14-bit ISPs capable of supporting 1.2GP/s throughput and image sensors up to 55MP.

There is a lot of I/O: HDMI 1.4, USB 3.0, UFS Gear2, eMMC 5.0, and SD 3.0 (UHS-1). Naturally, there’s great radio with 802.11AC, and the company’s 4th Generation Cat6 LTE with support for Qualcomm’s RF360 front end solution, and 3x20MHz carrier aggregation, enabling speeds of up to 300 Mbps, as well as Bluetooth 4.1, NFC, and the latest Qualcomm IZat location core for ubiquitous location services.

The Snapdragon 808 is a dual core Cortex A57, with an Adreno 418 GPU. The company says it offers up to 20% faster graphics performance than its predecessor, the Adreno 330 GPU. It’s been designed to support up to WQXGA (2560×1600) displays, and has a new level of GPU security. The 808 has dual 12-bit ISPs and uses LPDDR3 memory. It too can drive a 4k screen (via DMI 1.4).

In April, QTC said they would have 810 units in the hands of their partners by the end of the year, and devices would show up in mid-2015. QTC reports everything with Snapdragon 810 remains on track and we expect commercial devices to be available in 1H 2015.

As reported in their year end report 5 November, Qualcomm shipped 861 MSM chips in their Fiscal 2014 year (ending 28 September 2014).

Qualcomm Moves to 4K with Snapdragon 805

Qualcomm Technologies is bringing console quality graphics with 4K capture and display capability to mobile devices. The company has introduced the new Snapdragon 805, expanding the already widely popular Snapdragon applications processor SoC family, and promises to change the landscape of mobile entertainment systems and communications.

The new SoC has an advanced Adreno GPU, the 420, which Qualcomm says delivers 40 percent more graphics and compute performance than its previous generation GPU while using 20% less chipset power than Snapdragon 800 (8974 )for the same graphics workloads.. The Adreno 420 is not a minor update from Adreno 330. On the contrary, it is a new GPU architecture, 100% designed in house by Qualcomm specifically for mobile use cases, and it comes along with other substantial system level enhancements outside of the GPU itself.

Qualcomm added several completely new 3D pipeline stages in the A420, for example a Hull shader, domain shader, geometry shader, and the first AP GPU to incorporate a dedicated Tessellation hardware; the shader subsystem and caches have also been augmented in order to support OpenGL ES 3.0, OpenCL 1.2 Full profile, and DX11.

Also, some hardware improvements were made to the GPU front end to improve depth rejection, and the texture pipe is >2x more capable than it was for Adreno 330 both in terms of raw texel performance and also in its support of superior compression schemes like ASTC.

The A420 also includes several render backend hardware improvements such as wider & more efficient data paths to internal memory. There’s a new color/z compression module that when combined with 8084’s more capable 128bit 800MHz LPDDR3 (or 933MHz PCDDR3) memory bus, enables Adreno 420 to maintain its peak pixel fill rate more often, and that is particularly important when driving 8Mpixel (4Kx2K) displays.

Qualcomm says they’ve increased performance without sacrificing power by including more DCVS voltage/frequency pairs, and moving the position of the GPU within Snapdragon to a new, Low-latency high bandwidth MMU/Bus (i.e. a dedicated bus that is independent of the Video Decoder and ISPs).

The 805 introduces a new 2.5 GHz Krait 450 CPU based on the ARMv7 instruction, however, like Apple and Nvidia, the CPU’s architecture is Qualcomm’s unique design. Unlike other SoC builders, however, the Krait cores dynamically adjust the clock and voltage as system demands change. This manages power consumption for longer battery life, while being able to kick into overdrive if an application demands it. Additional performance enhancements include a new memory manager that doubles memory bandwidth to 25.6 Giga bytes per second making it capable of delivering 4K images and video smoothly and continuously.

The Snapdragon 805 will include two high-bandwidth wide bit-width image signal processors (ISPs). That will enable the 805 to capture with ease 4K images at high frame rates, and/or high-speed stereo images, dual camera (front and back) images for video conferencing, and super-fast sports photos.

In order to handle the new 4K images the 805 will be one of the first production processors to support H.265/HEVC CODECs. Qualcomm acquired the assets of IDT’s Silicon Optics Hollywood Quality Video (HQV) and Frame Rate Conversion (FRC) Video Processing product lines back in 2011and incorporated the technology into its video processing engine. The HQV and FRC processes handle de-interlacing and scaling and can smoothly upscale a Blu-ray file to 4K UHD.
The new Hexagon V6 DSP can multi task, and run video conversions, and/or Enhanced HD multi-channel audio (encoding, decoding, transcoding, noise cancellation, bass boost, virtual surround and other enhancement functions).

When the Snapdragon 805 goes into full production, it will use the new TSMC’s 20nm production node, which will reach commercial capacity in early 2014.
Using Qualcomm’s fusion-paring, the Snapdragon 805 can be combined with the new Gobi 9×35 LTE CAT 6 modem supporting 4K streaming and transfer. The 9×35 is smaller and allows an OEM to realize a thinner and more power efficient part than the current 9×25 modem (introduced in February 2013) while still supporting LTE Category 4— also known as LTE Advanced. CAT6. LTE, however will reach speeds of 300 Mbps (compared to CAT4’s150Mbps). The dual band HSPA+ Gobi MDM 9×35 is the industry’s first modem manufactured in a 20nm process technology.

Qualcomm is shipping samples of the Snapdragon 805 and Gobi 9×35 to ODMs and OEMs now. You can expect to see them in amazing new mobile devices second half of 2014.

Take Aways

It’s the most aggressive move in mobile graphics by any company, to add all the shader types, and HW tessellation, on top of what they did in Subdiv for Moto, shows Qcom as the most committed mobile graphics supplier today. It really is bringing console class graphics to mobile devices.

4K requires a big fat pipe to suck up all the pixels that come blasting at you 15 billion bits a second. So Qualcomm put in tow ISPs to handle the flow. Once you start eating that many bits that fast you have to compress them, so Qcom incorporated the new H.265 CODEC. And then if you want to send them anywhere you a wide memory bus so Qcom double the memory bus width, and up’ed the clock rate. Net result – Qualcomm can really do 4K.

Trends and Forecasts in Computer Graphics

Having just finished a book on the history of computer graphics I’ve decided to be Janus and also look at the future, which seems appropriate because that’s where we’re all going to end up. Over the next few months I’m going to post essays and thoughts on what I see as the developing trends, and some far out ideas (like a totally voxel-based game and rendering engine). I’d also like to hear from you. What do you think is an interesting, or scary trend, and what would you like to see? Of course, I don’t have to ask for your criticisms; I know you’ll share them with me.

The move to mobile

I think we’ve all been fascinated with the way mobile devices have taken off in popularity. No small portion of that surge in interest is the display—the place where the user encounters the device. I think it’s safe and fair to say Apple gets credit for lighting up our interest and imagination in mobile personal devices with big displays. Not so much to fawn over Apple, but rather to pick a point in time when the revolution started June 29, 2007. That was when all the things that were being experimented with in various products from various manufactures came together at once: touch, high-resolution display (for the times), large display, intuitive GPU accelerated UI, and dozens of applications that worked together smoothly and efficiently.

The impact of the API
The most important things in any system are the interfaces. How does a signal, or a word, or pulse get from one component to another? When data is moved, an API is usually involved. At Siggraph this year, I will give a brief talk at the Khronos BOFS Party (24 June, Hilton California Ballroom, around 19:00 – the party starts at 17:00 or earlier) on the history of APIs, and discuss why they are so important, contentious, and difficult.

The first computer graphics (CG) API was for the IBM 2250 graphics terminal in 1965. Forty-seven years later we got the first really useful, and (so far) stable API for mobile devices—Open GL ES 3.0.

OpenGL ES 3.0 is now available to developers in Android Jellybean (MR2) and that game engine companies are also in the process of enabling developers to benefit from the new OpenGL ES 3.0 API features.

The Khronos organization develops and manages API used in CG applications. Open GL ES 1.1, released August 2004 from Khronos, was first used in the Samsung SE P850 in 2005, and the first iPhone with PowerVR MBX GPU supported ES 1.1 two years later. So Open GL ES really was the enabler of the revolution to smartphones.

OpenGL ES 2.0, announced in March 2007, introduced the OpenGL Shading Language for programming vertex and fragment shaders. It removed any fixed functionality that a shader program could replace. That minimized the cost and power consumption of advanced programmable graphics subsystems.

Open GL (OGL) ES 3.0, introduced about a year ago at Siggraph in August, was a major breakthrough in bringing unified shaders, which enabled writing programs for shader effects.

Most importantly, OGL ES 3.0 also added support for 32-bit integer and 32-bit floating-point (i.e. full precision) data types to the pixel shaders (AKA fragment shaders). This was significant because as shader complexity increases, as it has been, the potential for errors (without full precision) becomes even larger. In addition, bringing unified, bringing full-precision shader capability to a mobile device gives that device the potential to run computer graphics programs that are of workstation quality and class. This a major tipping point.

Also worth mentioning is the Real int32 support, which is new in OGL ES 3.0 and applies to both Fragment and Vertex Shader.

Occlusion and stamping
OGL ES 3.0 doesn’t have specific geometry shaders; however, it does have several features that help with geometry. Two of the most useful features are occlusion queries and geometry instancing. Occlusion queries allows for fast hardware testing of whether an object’s pixels are blocking (occluding) another object. That’s essential in determining if something doesn’t have to be rendered (because it’s occluded) and thereby saving processing time which in turn gives higher performance with less battery usage.

Geometry instancing (AKA “stamping”) enables the GPU to draw the same object multiple times but only requires the object to be sent to the rendering pipeline once.

Screen Shot 2013-06-20 at 2.03.48 PM
This makes objects such as birds, foliage, grass, surface rocks, or phony smoke easier for the CPU to set up because the objet doesn’t have to be sent repeatedly. That in turn also saves battery.

Ever since Jim Blinn invented texture mapping in 1977, it has been used to create amazingly realistic scenes. However, textures can be big and there can be multiple (smaller) copies of them for mip-mapping. Therefore, it is highly desirable to be able to compress the textures, which is relatively easy to do because there is usually a lot of redundancy in a texture image. However, texture compression has been a contentious subject with claims and counter claims about IP and patents.

The OGL committee, composed of CG experts from all over the world and all the leading CG companies, spent years developing a royalty free texture compression algorithm. In early 2012, Ericsson offered their ETC family of texture compression algorithms on a royalty free basis and allowed Khronos to implement it as a standard texture compression format. It was gratefully accepted and deployed for the first time in OGL ES 3.0—another break through.

ETC was introduced in OpenGL ES 2.0, but did not have Alpha support hence was consequently not adopted by developers. Developers instead had to support hardware specific texture compression formats like ATC, PVRTC, DXTC (This was the biggest pain for developers on Android today). OpenGL ES 3.0 brings in newer and better, royalty free, ETC2 texture compression format (with alpha support) all hardware suppliers (IHVs) are required to support it.

This makes it simple for developers, as going forward they do not have to compress their textures in separately for each device.

In addition to texture compression, 3.0 added support for multiple render targets enabling the GPU to render to multiple textures at once. This is significant in that it enables real-time deferred rendering, which saves enormous GPU cycles and therefore more battery power.

Screen Shot 2013-06-20 at 2.04.59 PM

Textures, as mentioned, can get large, and therefore more accuracy is needed in rendering them. ES 3.0 includes support for floating-point textures as well as 3D textures, depth textures, non-power-of-two textures, and overlay (alpha plane) 1 & 2 channel textures (R & R/G).

However, the GPU needs to support an extension to access this functionality. Several GPUs such as Quallcomm’s Adreno, and Imagination Technologies support this, but it is not part of of the OGL ES 3.0 requirement.

Shadows and jaggies
Shadow mapping or projective shadowing where shadows are added to an image is a concept introduced by Lance Williams in 1978, and it’s been used extensively in CG ever since. However, a shadow map, like any other object, has edges, and those edges if looked at closely can have jaggies. An edge filtering technique (like AA is some sense) can be used, and in OGL ES 3.0 there a technique known as “percentage closer filtering” (PCF) available.

PCF works by filtering the result of the depth comparison. So when comparing a depth, some depths around should also be compared and the result should be averaged. This will give a softer look on the shadow edges.

PCF is a newer technique for making softer shadows when using shadow mapping. PCF takes advantage of a dedicated hardware block that does texture filtering after the depth test. The standard texture filtering does filtering as soon as the texel is fetched and it does not wait for the depth test.

It is an almost ideal solution for shadows because its fast and just accurate enough that it doesn’t need a lot of processing.

Screen Shot 2013-06-20 at 2.06.33 PM

OGL ES 3.0 also has quality anti-aliasing features like multi-sample antialiasing (MSAA).

Open GL ES 3.0 has incorporated many features from its big brother the workstation version of Open GL. In many respects, OGL ES has advanced features not yet found in Microsoft’s Direct X API. That is an extraordinary situation, and creates an environment where mobile graphics can actually exceed PC graphics—unthinkable a few years ago.

The ecosystem has also been developed. Benchmark developers like, Kishonti, and Rightware have already announced their benchmarks based on OpenGL ES 3.0, and Futuremark is working on one. With game engine developer Unity and Google’s official support for OpenGL ES 3.0 on its way, there will be plenty of games and applications in Google Playstore this year.

Because the API is so buried in the system, and perhaps so esoteric, consumers will never be aware of its features or the power it unleashes in todays (and tomorrows) SoCs. However, it should be used in the marketing of mobile device and some educational work done to sensitize the users. I think mobile devices should have an “ES 3.0 enabled” logo or at least a line in the specs.

Open GL ES 3.0 now makes AAA games found on a PC possible on a mobile device—I didn’t think I’d be saying something like that for a few more years.

Nvidia’s “Core” Business

Nvidia introduces a new business unit targeted at the other billion opportunities

Nvidia has been involved with embedded, semi-custom and IP sales for special customers like Sony, Microsoft, Intel and Audi for several years, so they are no stranger to the idea, or the issues in supporting such customers.

In the last, five or so years we have seen the demand for visualization and embedded compute sky rocket. As one Nvidia exec said to me, “You can’t swing a dead cat and not hit a viz app or need”. And he’s right. That’s the good news.

The bad news, if it could be called that, is no company, not Nvidia, Intel, Qualcomm, no company could satisfy all the opportunities in a timely way. You could call it an embarrassment of riches of opportunities.

In the past couple of years Nvidia invested over a billion dollars in R&D on what could become the first of the ultimate cores—Kepler. Kepler is the first architecture ever designed to scale all the way from mobile to its highest end Tesla processors. It’s unique in that at its most atomic level (192 processors) it is a half-watt device. That is “the core”. That basic Kepler core can be (is) used in all the processors in Nvidia’s entire product line from Tegra to Titan. How’s that for common IP and scalability? And there’s no tricky political side-mouth talking here—it is that core, no special sauce wrapper, or depopulation fusing, or emulations, just one operational core.


The ultimate in scalability, Nvidia’s Kepler core

Again the good-bad news. The core could be used in anything from AR glasses to game consoles, to medical equipment, avionics, to CAVEs and super computers. But Nvidia couldn’t ever even find all the applications and potential customers for its core so it’s going to share the processor.

Nvidia has set up a business unit run by the formable Bob Feldstein, who knows a bit or two about the embedded and core licensing business, to run Nvidia’s core business. Feldstein will offer a RTL Kepler design to new customers. Bob isn’t a one-man band and has an equally formidable team of engineers and AEs to work with and support customers. So Bob and team are hanging out their sign—Open For Business.

This is major undertaking for Nvidia, and something that has taken many months to work out; and they’re not finished yet. The company probably won’t see any significant revenue for two maybe three years. It will take that long for the new (non-traditional) customers to spec, design, and manufacture their parts and/or systems. Having said that, don’t expect to hear a bunch from Nvidia about this operation. Obviously they won’t be able to talk about new products their new customers might be designing.
What do we think?

If you’ve ever played a AAA FPS game on a PC you’ve probably seen the “The way it was meant to be played” Nvidia tag. I guess we can now expect to see a “The way it was meant to be seen” boot-up tag on our TVs, car dash ports, handheld medical devices, CNC machines, and store signs.

Nvidia is right about the abundance of opportunities. Most, if not all of them are being satisfied today with tiny dedicated and un-programmable graphics controllers. But in the world of the internet of things where every machine can talk to every other machine, an un-programmable device is no longer acceptable. The Nvidia Core can be a basic high-performance parallel processor, and/or a great programmable display processor. The opportunity is there, the processor is here, now all Nvidia has to do is let the 8 billion inhabitants of this world know about it.

Time to upgrade Your PC

While preparing for a presentation I was going to give, I looked at some historical data. I like to do that to set the stage for my talks and give the audience a chance to get in synch with my comments.

When you’re running in the trenches 24-7 you tend to forget some of the outside world events, and that happened to me I’m embarrassed to say. A historical review reset my perspective.

A 2009 Intel has just released the Core i7 Nehalem Processor and AMD brought out its 6-core CPU Istanbul processors. We thought those new processors were amazing, and for the time they were, but that was four years ago, and things move fast in this industry.

If you bought an IBM PC in the US for $3,000 in 1981, you were actually spending the equivalent of $7,461 in 2012 dollars. A notebook PC bought in 2009 for $630 would cost the equivalent today of $672, but the ASP of a notebook today is only $546.07.

Screen Shot 2013-05-06 at 8.43.05 PM

Meanwhile, while the cost of living index has been steadily going up, the average selling price of PCs has been coming down.

Screen Shot 2013-05-06 at 8.45.27 PM

Not only have PC prices been going down, but also due to Moore’s law, the performance has been going up.

Screen Shot 2013-05-06 at 8.46.53 PM

More computing power is great, but its costs money to run a PC. And electricity, like every thing else is getting more expensive every year.

Screen Shot 2013-05-06 at 8.55.18 PM

Yet, even with more powerful processors, bigger disks, and more memory, the average power consumption, thanks again to Moore’s law, has been going down.

Screen Shot 2013-05-06 at 8.49.03 PM

Today’s PCs use less power, which save costs, while delivering more performance.

So today, you can get a more powerful computer for fewer dollars than the old 2008 or 2009 unit you have. In addition, the new PCs can do so much more.

The newest applications and programs need more CPU horsepower, and memory. Consider the latest games, Windows 8, Office 2010, video transcoding, photo editing, and peripherals that require the latest I/O such as HDMI 1.4, USB 3.0, and DisplayPort 2.2, and soon 4k displays.

Also, with the new memory technologies like LP-DIMM, you can now have 48GB to 144GB of Ram in your system, which makes everything run faster.

Consumers and business that have old machines, which are ever more expensive to maintain and keep running (and run slowly when they do), have been stalled in their normal update or refresh process by the fanfare over how wonderful tablets are. Tablets are indeed wonderful, and are a nice contribution and companion to the computing environment. But as has been said by so many and now so often, tablets are certainly not a replacement for most of what a PC is used for. They are at best a compromise for trying to emulate what one does with a PC.

Now’s the time to get a new PC, the cost-benefits couldn’t be better.

A PC is a Truck and Sometimes You Need a Truck

Steve Jobs said of the PC as he was developing his post PC theme, “PCs are going to be like trucks; less people will need them. And this is going to make some people uneasy.” Well he was right, if not accurate. A whole lot of trucks are sold. In just the US in February 2012 612, 145 cars were sold, and 537, 251 light duty trucks were sold, plus an additional 225,621 Cross-over trucks. So more trucks sold than cars, and even if you add SUVs (97,825) to the cars, there were still more trucks sold in the US, and that’s not counting the bigger ones that chew up your streets and bring stuff to your shopping center or gas station.

No doubt about it tablets and smartphones are popular and selling well. What most folks seem to miss (or want to miss) is we do have cars and trucks, and motorcycles, and we will continue to have smartphones (motorcycles), tablets (cars?), and PCs (trucks?) Because Tablets are popular doesn’t lead to the conclusion that PCs suddenly aren’t. But if all you’ve got to sell is a tablet, well then the world looks a little different. And even if you have a truck to sell, if it’s getting hammered by the competition, but your car or motorcycle is showing a better margin and/or shipment level, well your interest and emphasis is pretty predictable isn’t it?

So Apple wants to promote the “Post PC” concept as a way of casting the PC in a no longer important, been there done that, obsolete technology. And because they are Apple, the only company capable of original thought or charisma, and the role model for all other computer, electronics, and phone companies, the term “Post PC” will be adopted, and heralded as the coming, the new era, the I’ll follow Apple into hell slogan of all the wantabees—which is everyone from Microsoft to the smallest Chinese cloner.

Actually, Apple makes life so much simpler for the rest of us. We don’t have to invest time and money in clever ideas, or marketing, we just have to wait for Apple to tell us what’s cool now and then copy it as best we can.

The PC/Tablet/Phone industry reminds me of a bunch of teenage girls, watching the cool girls in order to find out what is in and what’s out. Nobody wants to get caught with something that’s out and not cool, that’s worse than a zits breakout on prom night. And it’s a caution to the buyers of all this non-Apple, Apple defined cool stuff. If the company you’re considering buying something from is an Apple follower you should think twice about buying anything from them. It’s unlikely they are going to be a faithful supporter of that currently cool copycat thingie—you’d be better off buying the real deal from Apple.

But it’s tough not being Apple when it’s so big, so rich, and so trend setting. Right now the only company that seems to have a chance at standing up to Apple is HP. All the rest are still in the beige/black box PC world where this year’s machine looks just like last year’s machine and the year before it. Where this year’s big breakout product is an Ultrabook that looks like every other Ultrabook which in turn is trying to look like Apple’s Air. It’s pathetic.

So if the PC industry turns into the unexciting truck industry, regardless of shipment numbers, it’s its own fault for being lazy and scared – get some backbone PC makers, take a chance—do something original. You might find you actually like it.

Future TV, touch it, look into it, wave at it.

The choices for user interaction are multiplying rapidly

With IFA wrapped up and IBC coming, all the new TV related products as well as new TVs are in the press.
S3D screens, with and without glasses have been with us a while. Large screen displays that allow interaction with gesture have been available as expensive custom devices for election night and sports TV announcers. The console suppliers introduced lower cost gesture capabilities to TV screens, and most recently TV suppliers are offering touch screens.

New gesture remote control devices will show up later this year, and there are even a couple of voice-activated devices.
LG’s Pentouch TV shown at IFA is a good example. It comes with a pair of Touch Pens that can be used simultaneously on the screen. So users can sit really close to a big screen TV and draw pictures, or slect icons.

Stephen Gater, consumer electronics marketing director LG UK, commented, “We’re all used to touch screens being available on our phones and even tablets now, but LG is one of the first to be offering this technology on large TV screens.”
The LG TV also offer S3D capability with active shutter glasses.

What do we think?
The TV of the future and that future will be here by the holiday season of 2012, will be amazing. Competing with each other as well as tablets, phones and even AIO PCs, the TV set manufactures are scrambling to be innovative and differentiated. How we interact with the TV and communicate via will dramatically change in the next few years. Wirelessly and seamlessly connected to every other device we use, our mobile phone and tablets will be remote controls and second screens.

The TV will be picture frame when not showing Judge Judy, and we’ll push, wave at, talk to, and wipe our TVs, as well as watch them in 2D and S3D. TV—you ain’t seen nothing like it.