Inside HPC & AI News | High-Performance Computing & Artificial Intelligence
At the Convergence of HPC, AI at Scale, Quantum
Subscribe
  • News
    • AI News
    • Business of HPC
    • New Installations
  • HPC-AI Hardware
    • Compute
    • CPUs, GPUs, FPGAs
    • Exascale
    • Future Technology
    • Green HPC
    • HPC/AI Chips and Systems
    • Network
    • Quantum Computing
    • Storage
  • HPC-AI Software
    • AI & Machine Learning
    • Cloud HPC
    • High Performance Analytics
    • Lustre
    • Parallel Programming
    • Systems Management
    • Tools
  • Quantum
  • Resources
    • Thought Leader Articles
    • Education / Training
    • Events Calendar
    • HPC Career Notes
    • Industry Perspectives
    • Industry Segments
      • Enterprise HPC
      • Financial Services
      • Government
      • Manufacturing
      • National Lab News
      • Research / Education
    • Jobs Board
    • Vanguards HPC-AI
    • Special Reports
    • The Exascale Report Archives
    • White Papers
  • Podcasts & Videos
    • @HPCpodcast
    • Other Podcasts
    • Videos
  • Power & Cooling
    • Advanced Tech & Efficiency
    • Air & Liquid Cooling
    • Data Center
    • Green Data Center
    • Infrastructure Design/Management
    • Interconnects & Networking
    • Nuclear, Solar, Wind, LNG, Geothermal, Fusion
    • Sustainability
    • System & Facility Monitoring
  • AI News
  • Search

Faster AI and HPC Workflows with Quobyte’s New File Query Engine

July 12, 2024 by staff
Print Friendly, PDF & Email
  • share 
  • share 
  • share  
  • share  
  • email 

[SPONSORED GUEST ARTICLE]  In the world of high-performance computing (HPC), where petabyte-scale storage and billions of files are commonplace, efficiently managing and querying massive data stores is crucial. Recognizing this challenge, Quobyte has introduced its File Query Engine, a powerful new tool designed to complement its existing policy engine and analytics functionality.

The Quobyte File Query Engine offers a distributed, high-performance solution for querying file system metadata like a database, addressing key pain points for HPC administrators and users alike. This innovative feature, part of Quobyte’s latest release 3.22, promises to streamline data management and accelerate AI and HPC workflows in large-scale environments.

Accelerating Metadata Queries in HPC Environments

One of the primary advantages of Quobyte’s File Query Engine is its ability to rapidly execute metadata queries across massive datasets. Traditional methods, such as file system tree walks, can take hours or even days to complete on large volumes. The File Query Engine dramatically reduces this time, enabling administrators to quickly answer critical questions about their data landscape.

For instance, HPC administrators can now efficiently identify cold files consuming significant space, locate all files owned by a specific user, or implement data lifecycle management policies, such as deleting files in scratch directories older than a specified timeframe.

Enhancing AI/ML Workflows

The File Query Engine’s capabilities extend beyond administrative tasks, offering particular benefits for AI and machine learning workflows. By leveraging user-defined metadata (extended attributes and S3 custom metadata), researchers can more effectively manage training datasets. This approach allows for direct labeling of files with relevant metadata, eliminating the need for separate, hard-to-manage metadata files often used in AI/ML pipelines.

Architecture and Performance Advantages

What sets Quobyte’s File Query Engine apart is its integration with the file system’s distributed metadata architecture. Unlike solutions that require separate database layers, Quobyte’s engine operates directly on the distributed and replicated key-value store that houses its metadata. This design choice offers several advantages:

  1. Improved Performance: By eliminating the need for data synchronization between the file system and a separate database, queries execute faster and always operate on current data.
  2. Resource Efficiency: The absence of a redundant metadata copy significantly reduces resource overhead like RAM and disk consumption.
  3. Scalability: Leveraging Quobyte’s distributed metadata store, queries are executed in parallel across all metadata servers, enabling rapid scans of entire clusters or selected volumes.
  4. Real-time Streaming: Results are streamed back to the application in real-time, supporting very large result sets with billions of files while automatically adjusting to the consumer’s processing speed.

Practical Application and Ease of Use

The File Query Engine is accessible through Quobyte’s command-line tool “qmgmt,” its API, and predefined metadata searches available directly from the Webconsole, offering flexibility for various use cases. Administrators and researchers can easily construct queries to filter files based on a wide range of criteria, including file attributes, modification times, and custom metadata. For common queries, such as “Failure domain file spread,” the Webconsole provides an intuitive interface, eliminating the need to dive into the command line.

For example, a simple command can identify all JPEG files modified in the last 10 minutes:

qmgmt query files ‘name~=”.*(jpeg|jpg)” AND mtime_age<“10min”‘

More complex queries leveraging user-defined metadata are also supported, enabling precise data selection for analysis or processing:

qmgmt query files ‘xattr.origin=”FR” AND xattr.width>=1024’

This query would return all files with a custom “origin” attribute set to “FR” (France) and a width of at least 1024 pixels, demonstrating the engine’s potential for detailed dataset curation in research environments.

Conclusion

Quobyte’s File Query Engine represents a significant advancement in managing and querying large-scale storage environments common in HPC settings. By offering rapid, resource-efficient metadata queries without additional infrastructure, it promises to enhance both administrative efficiency and research workflows. As data volumes continue to grow in scientific and high-performance computing environments, tools like the Quobyte File Query Engine will become increasingly vital in harnessing the full potential of big data in research and analysis.

  • share 
  • share 
  • share  
  • share  
  • email 
Filed Under: HPC-AI Hardware, HPC-AI Software, Machine Learning, News, Storage Tagged With: AI, high performance storage, HPC AI, HPC', Metadata, metadata management, Quobyte, Quobyte File Query Engine
«
»
»
«

Sponsored Guest Articles

How MiTAC Helps Organizations Scale for Both AI Training and Inference

“Our design philosophy is centered around our customers. They need solutions that are not just technically advanced but also seamlessly integrated, easily scalable, and reliable.”

White Papers

Mastering the Complexities of AI at Scale

Artificial intelligence (AI), one of the most transformative innovations in enterprise IT, will continue to dominate the technology landscape for the foreseeable future. Organizations across industries are leveraging AI to differentiate their offerings and secure their competitive advantage. However, maximizing AI’s potential requires AI-optimized hardware and software as well as the specialized knowledge needed to […]

Download
More White Papers

Join Us On Social Media

Featured From
  • DDN Introduces AI Data Architecture, Addresses NAND Shortages

    Chatsworth, CA — AI data platform provider DDN announced new capabilities across its EXA and Infinia product lines desitned to enable organizations to enhance AI performance and GPU utilization even as global NAND shortages drive SSD prices up by 75–125 percent. These advancements uniquely position DDN as the only vendor capable of maintaining AI factory […]

More News from insideAI News

  • Report: AI Back-End Networks Continue Shift to Ethernet
  • NVIDIA Introduces CUDA 13.1 with CUDA Tile
  • The Infrastructure Revolution for AI Factories
  • Vultr and AMD Expand AI Supercluster Collaboration
  • Red Hat Expands Inference Collaboration with AWS AI chips
  • ZincFive Raises $30M for AI Data Center Batteries
  • Taking on ASML: U.S. Invests $150M in Gelsinger-Backed EUV Startup
  • About insideHPC
  • Contact
  • Advertise with insideHPC
  • Visit Our Other Site – insideBIGDATA
  • Terms of Service & Copyright
  • Privacy Policy
Inside HPC & AI News | High-Performance Computing & Artificial Intelligence
Copyright © 2025