High-Performance Computing News Analysis | insideHPC
At the Convergence of HPC, AI and Quantum
Subscribe
  • News
    • Business of HPC
    • New Installations
  • HPC Hardware
    • Compute
    • CPUs, GPUs, FPGAs
    • Exascale
    • Future Technology
    • Green HPC
    • HPC/AI Chips and Systems
    • Network
    • Quantum Computing
    • Storage
  • HPC Software
    • AI & Machine Learning
    • Cloud HPC
    • High Performance Analytics
    • Lustre
    • Parallel Programming
    • Systems Management
    • Tools
  • Industry Segments
    • Collaboration
    • Data Center
    • Enterprise HPC
    • Financial Services
    • Government
    • Manufacturing
    • National Lab News
    • Research / Education
  • Resources
    • Thought Leader Articles
    • Education / Training
    • Events
    • Events Calendar
    • HPC Career Notes
    • Industry Perspectives
    • Jobs Board
    • Research / Reports
    • Vanguards HPC-AI
    • Special Reports
    • The Exascale Report Archives
    • White Papers
  • Podcasts & Videos
    • @HPCpodcast
    • Other Podcasts
    • Videos
  • Power & Cooling
    • Advanced Tech & Efficiency
    • Air & Liquid Cooling
    • Green Data Center
    • Infrastructure Design/Management
    • Interconnects & Networking
    • Nuclear, Solar, Wind, LNG, Geothermal
    • Sustainability
    • System & Facility Monitoring
  • Jobs in HPC
  • Search

Faster AI and HPC Workflows with Quobyte’s New File Query Engine

July 12, 2024 by staff
Print Friendly, PDF & Email
  • share 
  • share 
  • share  
  • share  
  • email 

[SPONSORED GUEST ARTICLE]  In the world of high-performance computing (HPC), where petabyte-scale storage and billions of files are commonplace, efficiently managing and querying massive data stores is crucial. Recognizing this challenge, Quobyte has introduced its File Query Engine, a powerful new tool designed to complement its existing policy engine and analytics functionality.

The Quobyte File Query Engine offers a distributed, high-performance solution for querying file system metadata like a database, addressing key pain points for HPC administrators and users alike. This innovative feature, part of Quobyte’s latest release 3.22, promises to streamline data management and accelerate AI and HPC workflows in large-scale environments.

Accelerating Metadata Queries in HPC Environments

One of the primary advantages of Quobyte’s File Query Engine is its ability to rapidly execute metadata queries across massive datasets. Traditional methods, such as file system tree walks, can take hours or even days to complete on large volumes. The File Query Engine dramatically reduces this time, enabling administrators to quickly answer critical questions about their data landscape.

For instance, HPC administrators can now efficiently identify cold files consuming significant space, locate all files owned by a specific user, or implement data lifecycle management policies, such as deleting files in scratch directories older than a specified timeframe.

Enhancing AI/ML Workflows

The File Query Engine’s capabilities extend beyond administrative tasks, offering particular benefits for AI and machine learning workflows. By leveraging user-defined metadata (extended attributes and S3 custom metadata), researchers can more effectively manage training datasets. This approach allows for direct labeling of files with relevant metadata, eliminating the need for separate, hard-to-manage metadata files often used in AI/ML pipelines.

Architecture and Performance Advantages

What sets Quobyte’s File Query Engine apart is its integration with the file system’s distributed metadata architecture. Unlike solutions that require separate database layers, Quobyte’s engine operates directly on the distributed and replicated key-value store that houses its metadata. This design choice offers several advantages:

  1. Improved Performance: By eliminating the need for data synchronization between the file system and a separate database, queries execute faster and always operate on current data.
  2. Resource Efficiency: The absence of a redundant metadata copy significantly reduces resource overhead like RAM and disk consumption.
  3. Scalability: Leveraging Quobyte’s distributed metadata store, queries are executed in parallel across all metadata servers, enabling rapid scans of entire clusters or selected volumes.
  4. Real-time Streaming: Results are streamed back to the application in real-time, supporting very large result sets with billions of files while automatically adjusting to the consumer’s processing speed.

Practical Application and Ease of Use

The File Query Engine is accessible through Quobyte’s command-line tool “qmgmt,” its API, and predefined metadata searches available directly from the Webconsole, offering flexibility for various use cases. Administrators and researchers can easily construct queries to filter files based on a wide range of criteria, including file attributes, modification times, and custom metadata. For common queries, such as “Failure domain file spread,” the Webconsole provides an intuitive interface, eliminating the need to dive into the command line.

For example, a simple command can identify all JPEG files modified in the last 10 minutes:

qmgmt query files ‘name~=”.*(jpeg|jpg)” AND mtime_age<“10min”‘

More complex queries leveraging user-defined metadata are also supported, enabling precise data selection for analysis or processing:

qmgmt query files ‘xattr.origin=”FR” AND xattr.width>=1024’

This query would return all files with a custom “origin” attribute set to “FR” (France) and a width of at least 1024 pixels, demonstrating the engine’s potential for detailed dataset curation in research environments.

Conclusion

Quobyte’s File Query Engine represents a significant advancement in managing and querying large-scale storage environments common in HPC settings. By offering rapid, resource-efficient metadata queries without additional infrastructure, it promises to enhance both administrative efficiency and research workflows. As data volumes continue to grow in scientific and high-performance computing environments, tools like the Quobyte File Query Engine will become increasingly vital in harnessing the full potential of big data in research and analysis.

  • share 
  • share 
  • share  
  • share  
  • email 
Filed Under: HPC Hardware, HPC Software, Machine Learning, News, Storage Tagged With: AI, high performance storage, HPC AI, HPC', Metadata, metadata management, Quobyte, Quobyte File Query Engine
«
»
»
«

Sponsored Guest Articles

Why Tier 0 Is a Game-Changer for GPU Storage

[SPONSORED GUEST ARTICLE] In tech, you’re either forging new paths or stuck in traffic. Tier 0 doesn’t just clear the road — it builds the autobahn. It obliterates inefficiencies, crushes bottlenecks, and unleashes the true power of GPUs. The MLPerf1.0 benchmark has made one thing clear ….

White Papers

insideHPC Guide to HPC Fusion Computing Model – A Reference Architecture for Liberating Data

This insideHPC technology guide discusses how organizations need to adopt a Fusion Computing Model to meet the needs of processing, analyzing, and storing the data to no longer be static. This guide (i) provides an overview of the Fusion Computing Model; (ii) describes how Seagate Technology PLC (Seagate) and Intel Corporation technologies can meet fusion […]

Download
More White Papers

Join Us On Social Media

Featured From

RSS Featured RSS Feed

  • Instilling Foundational Trust in Agentic AI: Techniques and Best Practices
    By Dr. Eoghan Casey, Field CTO at Salesforce With artificial intelligence advancing and becoming increasingly autonomous, there is a growing shared responsibility in the way trust is built into the systems that operate AI. Providers are responsible for maintaining a trusted technology platform, while customers are responsible for maintaining the confidentiality and reliability of information […]

RSS More News from insideAI News

  • Duos Edge AI Confirms EDC Deployment Goal in 2025
  • Cognichip out of Stealth with $33M in Funding for Artificial Chip Intelligence
  • Openlayer Raises $14.5 Million Series A
  • Saudi Arabia Unveils AI Deals with NVIDIA, AMD, Cisco, AWS
  • Adaptive Power Systems for the 100kw-Rack AI Data Center
  • Rafay Launches Serverless Inference Offering
  • DataRobot Launches Federal AI Suite
  • About insideHPC
  • Contact
  • Advertise with insideHPC
  • Visit Our Other Site – insideBIGDATA
  • Terms of Service & Copyright
  • Privacy Policy
High-Performance Computing News Analysis | insideHPC
Copyright © 2025