Published inParallel & Distributed Computing For Data EnthusiastsPartitioning Data at Runtime for Data Analytical QueriesPartitioning data is a fundamental technique data analytical systems use to achieve parallel execution in a distributed setting or a single…Feb 28Feb 28
Published inParallel & Distributed Computing For Data EnthusiastsAccelerated Data AnalyticsOnce a niche technology, Accelerated computing has become synonymous with GPUs and Artificial Intelligence. While specialized hardware like…Feb 22Feb 22
Published inParallel & Distributed Computing For Data EnthusiastsCompute Clusters — An IntroductionWith public clouds, large-scale computing is becoming a commodity where anyone can access a reasonably sized computer cluster in minutes…Feb 12Feb 12
Published inParallel & Distributed Computing For Data EnthusiastsLife Cycle of a Parallel ApplicationEven though many types of hosting systems exist, including clouds, bare-metal clusters, and supercomputers, a distributed parallel data…Feb 4Feb 4
Cluster Resources — Job SchedulingA computer cluster consists of many computer nodes that are closely coupled together with a network. A computer cluster is used for…Feb 1Feb 1
Published inParallel & Distributed Computing For Data EnthusiastsWhy Data Analytical Queries Scale WellIn a previous article, we discussed that data-intensive applications scale well but didn’t go into the details too much. In this article…Jan 29Jan 29
Published inParallel & Distributed Computing For Data EnthusiastsScalability of IO-Intensive ApplicationsModern computing applications can be divided into broader categories: IO-intensive and Compute-Intensive. As the name suggests…Oct 15, 2024Oct 15, 2024
Published inParallel & Distributed Computing For Data EnthusiastsVisual Guide to Distribution Patterns for Arrays in MPI, NCCLWhen multiple processes are involved in a parallel computation, they must communicate periodically to synchronize the data. Many libraries…Sep 16, 2024Sep 16, 2024
Published inParallel & Distributed Computing For Data EnthusiastsIntroduction to UCX Network ProgrammingNetwork programming is always a hassle; on the bright side, as developers, we rarely get to write network programs ourselves. In most…Sep 20, 2024Sep 20, 2024
Published inParallel & Distributed Computing For Data EnthusiastsParquet, Orc, Avro, CSV and JSONWe encounter data in many different formats. Some common examples are CSV, JSON, XML, Text, and Binary types. Every such format has a…Sep 5, 2024Sep 5, 2024