Recognizing and Measuring Vectorization Performance
Vectorization promises to deliver as much as 16 times faster performance by operating on more data with each instruction issued. The code modernization effort aims to get all software running faster by...
View ArticleWeather Research and Forecasting Model Optimized for Knights Landing
The Weather Research and Forecasting (WRF) Model is a numerical weather prediction (NWP) system designed for both atmospheric research and operational forecasting needs. It is made up of about a half...
View ArticleCompiling for the Intel® Xeon Phi™ processor and the Intel® AVX-512 ISA
IntroductionThis document briefly gives an overview of the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and shows different ways to build an application for the Intel® Xeon Phi™ processor...
View ArticleFine-Tuning Optimization for a Numerical Method for Hyperbolic Equations...
Frederico L. Cabral – fcabral@lncc.br Carla Osthoff – osthoff@lncc.br Marcio Rentes Borges – marcio.rentes.borges@gmail.comNational Laboratory for Scientific Computing (LNCC) IntroductionIn order to...
View ArticleHow to Debug Fortran Coarray Applications on Windows
When a Fortran coarray application is started under the Visual Studio* debugger on Windows, the current debug window does not have control over the images running the application. This article presents...
View ArticleOpenCL™ Drivers and Runtimes for Intel® Architecture
Some of the OpenCL* Drivers and Runtimes are provided as part of:Intel® SDK for OpenCL™ ApplicationsIntel® Media Server Studio Packages AvailableInstallation of a relevant runtime or driver enables...
View ArticleFinding your Memory Access performance bottlenecks
How your application accesses memory can dramatically impact performance. It is not enough to parallelize your application by adding threads and vectorization. Memory bandwidth is just as important...
View ArticleCaffe* Optimized for Intel® Architecture: Applying Modern Code Techniques
Improving the computational performance of a deep learning frameworkAuthorsVadim Karpusenko, Ph.D., Intel Corporation Andres Rodriguez, Ph.D., Intel Corporation Jacek Czaja, Intel Corporation Mariusz...
View ArticleJefferson Lab - Thomas Jefferson National Accelerator Facility
Principal Investigators:Balint Joo is a Computational Scientist working at Jefferson Lab on Lattice QCD calculations. He is a co-author and maintainer of the Chroma code for LQCD calculations, and is...
View ArticleIntel® Trace Analyzer and Collector 2017 Readme
The Intel® Trace Analyzer and Collector for Linux* and Windows* is a low-overhead scalable event-tracing library with graphical analysis that reduces the time it takes an application developer to...
View ArticleImprove Performance with Vectorization
This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization....
View ArticleTencent In-game Purchase Machine Learning Recommendation System on Intel®...
Online gaming is very popular now a day, especially with young people. They play games during their leisure time. They play online games among family members or among friends. In many cases, players...
View ArticleFine-Tuning Vectorization and Memory Traffic on Intel® Xeon Phi™...
by Andrey Vladimirov, Colfax InternationalCommon techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel® Xeon Phi™ coprocessors are discussed. These...
View ArticleBoosting Kingsoft Cloud* Image Processing with Intel® Xeon® Processors
BackgroundKingsoft1 Cloud* is a public cloud service provider. It provides many services including cloud storage. Massive images are stored in Kingsoft Cloud storage. Kingsoft provides not only data...
View ArticleDeveloper Success Stories Library
Learn how leading organizations worldwide are using development tools from Intel to boost performance, save development time and costs, and better meet their customers' needs.Intel® Parallel Studio |...
View ArticleAnalyzing GTC-P APEX code using Intel® Advisor on an Intel® Xeon Phi™ processor
IntroductionIn this article, we describe how we achieved 35% faster performance in GTC-P APEX code using Intel® Advisor on an Intel® Xeon Phi™ processor code named Knights Landing (KNL). Using Intel®...
View ArticleHybrid Parallelism: A MiniFE* Case Study
In my first article, Hybrid Parallelism: Parallel Distributed Memory and Shared Memory Computing, I discussed the chief forms of parallelism: shared memory parallel programming and distributed memory...
View ArticleRunning Intel® Parallel Studio XE Analysis Tools on Clusters with Slurm* / srun
Since HPC applications target high performance, users are interested in analyzing the runtime performance of such applications. In order to get a representative picture of that performance / behavior,...
View ArticleIntroducing DNN primitives in Intel® Math Kernel Library
Deep Neural Networks (DNNs) are on the cutting edge of the Machine Learning domain. These algorithms received wide industry adoption in the late 1990s and were initially applied to tasks such as...
View ArticleIntel® Trace Analyzer and Collector 2017 Update 1 Readme
The Intel® Trace Analyzer and Collector for Linux* and Windows* is a low-overhead scalable event-tracing library with graphical analysis that reduces the time it takes an application developer to...
View Article