Hadoop Archives - Page 2 of 4 - High-Performance Computing News Analysis

Programming for High Performance Processors

January 5, 2017 by MichaelS

“Managing the work on each node can be referred to as Domain parallelism. During the run of the application, the work assigned to each node can be generally isolated from other nodes. The node can work on its own and needs little communication with other nodes to perform the work. The tools that are needed for this are MPI for the developer, but can take advantage of frameworks such as Hadoop and Spark (for big data analytics). Managing the work for each core or thread will need one level down of control. This type of work will typically invoke a large number of independent tasks that must then share data between the tasks.”

Filed Under: Compute, Datacenter, Government, HPC Hardware, HPC Software, Industry Segments, Main Feature, News, Parallel Programming, Research / Education, Sponsored Post, Tools Tagged With: Apache Spark, Hadoop, Intel, Intel Parallel Studio XE, Intel TEC, MPI, OpenMP

Extreme-scale Graph Analysis on Blue Waters

August 27, 2016 by Doug Black

George Slota presented this talk at the Blue Waters Symposium. “In recent years, many graph processing frameworks have been introduced with the goal to simplify analysis of real-world graphs on commodity hardware. However, these popular frameworks lack scalability to modern massive-scale datasets. This work introduces a methodology for graph processing on distributed HPC systems that is simple to implement, generalizable to broad classes of graph algorithms, and scales to systems with hundreds of thousands of cores and graphs of billions of vertices and trillions of edges.”

Filed Under: Compute, Events, High Performance Analytics, HPC Hardware, HPC Software, Industry Segments, Main Feature, Research / Education, Resources, Videos Tagged With: big data, Blue Waters Supercomputer, Blue Waters Symposium, graph computing, Graph500, Hadoop, NCSA

Overview of the MVAPICH Project and Future Roadmap

August 26, 2016 by Doug Black

In this video from the 4th Annual MVAPICH User Group, DK Panda from Ohio State University presents: Overview of the MVAPICH Project and Future Roadmap. “This talk will provide an overview of the MVAPICH project (past, present and future). Future roadmap and features for upcoming releases of the MVAPICH2 software family (including MVAPICH2-X, MVAPICH2-GDR, MVAPICH2-Virt, MVAPICH2-EA and MVAPICH2-MIC) will be presented. Current status and future plans for OSU INAM, OEMT and OMB will also be presented.”

Filed Under: Compute, Events, HPC Hardware, HPC Software, Industry Segments, Main Feature, Network, Parallel Programming, Research / Education, Resources, Tools, Videos Tagged With: big data, DK Panda, Hadoop, InfiniBand, MapReduce, Memcached, MUG, MVAPICH, MVAPICH User Group, Ohio State University, weekly

Cray Urika-GX System to Tackle Big Data Analytics

May 24, 2016 by Doug Black

“We took the Aries system interconnect from our supercomputers, the industry-standard architecture of our clusters, the scalable graph engine from the Urika-GD appliance, and the pre-integrated, open infrastructure of our Urika-XA system and combined them into one agile analytics platform. The Urika-GX gives our customers the tool they need to overcome their most advanced analytics challenges today, and the platform to bridge to tomorrow.”

Filed Under: Compute, Datacenter, High Performance Analytics, HPC Hardware, HPC Software, Industry Segments, Network, News, Research / Education, Storage Tagged With: Apache Mesos, Apache Spark, big data, Cray, Cray Urika-GX, Hadoop, HPDA, Intel

RCE Podcast Looks at the Impala Project

April 20, 2016 by staff

In this RCE Podcast, Marcel Kornacker from Cloudera describes the Impala project. Impala brings scalable parallel database technology to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and Apache HBase without requiring data movement or transformation. Impala is integrated with Hadoop to use the same file and data formats, metadata, security and resource management frameworks used by MapReduce, Apache Hive, Apache Pig and other Hadoop software.

Filed Under: Datacenter, Enterprise HPC, High Performance Analytics, HPC Software, Industry Segments, Podcast, Resources Tagged With: big data, Cloudera, Hadoop, Impala, RCE Podcast

Video: Exploiting HPC Technologies to Accelerate Big Data Processing

April 15, 2016 by Doug Black

“This talk will present RDMA-based designs using OpenFabrics Verbs and heterogeneous storage architectures to accelerate multiple components of Hadoop (HDFS, MapReduce, RPC, and HBase), Spark and Memcached. An overview of the associated RDMAenabled software libraries being designed and publicly distributed as a part of the HiBD project.”

Filed Under: Compute, Datacenter, Enterprise HPC, Events, High Performance Analytics, HPC Hardware, HPC Software, Industry Segments, Network, Research / Education, Resources, Videos Tagged With: big data, Hadoop, Memcached, OpenFabrics Workshop, Spark

Learn Apache Hadoop with Spark in One Day

April 15, 2016 by Doug Black

Hadoop and Spark clusters have a reputation for being extremely difficult to configure, install, and tune, but help is on the way. The good folks at Cluster Monkey are hosting a crash course entitled Apache Hadoop with Spark in One Day. “After completing the workshop attendees will be able to use and navigate a production Hadoop cluster and develop their own projects by building on the workshop examples.”

Filed Under: Compute, Education / Training, High Performance Analytics, HPC Hardware, HPC Software, News, Resources Tagged With: Apache Spark, big data, cluster monkey, Hadoop, Weekly Newsletter Articles

Florida Atlantic University Selects Bright Cluster Manager for HPC

February 22, 2016 by Doug Black

Today Florida Atlantic University (FAU) announced that it is using Bright Cluster Manager software for its HPC cluster. The 56-node cluster is used for teaching Hadoop Map Reduce, bioinformatics research and other modeling and visualization work. Administrators say Bright Cluster Manager has significantly increased automation and is easily scalable to meet expected future growth.

Filed Under: Compute, High Performance Analytics, HPC Hardware, HPC Software, Industry Segments, News, Research / Education, Systems Management Tagged With: Bright Cluster Manager, FAU, Hadoop, Weekly Newsletter Articles

Chalk Talk: What is a Data Lake?

February 5, 2016 by Doug Black

“If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” These “data lake” systems will hold massive amounts of data and be accessible through file and web interfaces. Data protection for data lakes will consist of replicas and will not require backup since the data is not updated. Erasure coding will be used to protect large data sets and enable fast recovery. Open source will be used to reduce licensing costs and compute systems will be optimized for map reduce analytics. Automated tiering will be employed for performance and long-term retention requirements. Cold storage, storage that will not require power for long-term retention, will be introduced in the form of tape or optical media.”

Filed Under: Datacenter, High Performance Analytics, HPC Hardware, HPC Software, Industry Segments, Main Feature, Resources, Storage, Videos Tagged With: big data, Cassandra, Data Lake, Hadoop, HDS, Pentaho, Storage Switzerland

Scientific Cloud Computing Lags Behind the Enterprise

October 11, 2015 by Doug Black

“In business and commercial computing, momentum towards cloud and big data has already built up to the point where it is unstoppable. In technical computing, the growth of the Internet of Things is pressing towards convergence of technologies, but obstacles remain, in that HPC and big data have evolved different hardware and software systems while Open Stack, the Open Source cloud computing platform, does not work well with HPC.”

Filed Under: Cloud HPC, Enterprise HPC, Events, HPC Software, Industry Segments, Main Feature, Manufacturing, News, Research / Education, Resources Tagged With: big data, Hadoop, Intel, ISC Cloud & Big Data, Lustre, Scientific Computing

Programming for High Performance Processors

Extreme-scale Graph Analysis on Blue Waters

Overview of the MVAPICH Project and Future Roadmap

Cray Urika-GX System to Tackle Big Data Analytics

RCE Podcast Looks at the Impala Project

Video: Exploiting HPC Technologies to Accelerate Big Data Processing

Learn Apache Hadoop with Spark in One Day

Florida Atlantic University Selects Bright Cluster Manager for HPC

Chalk Talk: What is a Data Lake?

Scientific Cloud Computing Lags Behind the Enterprise

Sponsored Guest Articles

Microsoft and NVIDIA Together Advance AI

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA