Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


10 Questions to Ask When Starting With AI – Part 4

In this insideHPC Guide, “10 Questions to Ask When Starting With AI,” our friends over at WEKA offer 10 important questions to ask when starting with AI, specifically planning for success beyond the initial stages of a project. Reasons given for these failures include not having a plan ahead of time, not getting executive or business leadership buy-in, or failing to find the  proper team to execute the project. Chasing the hot technology trend without having a proper strategy often  leads companies down the path of failure.

Artificial intelligence (AI) and machine learning (ML) technologies are disrupting virtually all industries  globally—and AI technologies are not just being applied within robotics and vehicle automation. Companies  from financial services to retail, from manufacturing to health and life sciences are seeing business  improvements through insights generated by AI and ML.

#9 How does my infrastructure look on day 3 vs. day 300?

AI projects are constantly changing and evolving. The algorithms or software could change, as could the computing infrastructure, meaning that the model could start to run on company-owned servers and then  convert to running in a public cloud or a hybrid platform. If a company has aligned its AI data strategy with  the organization’s overall compute strategy (see question #3), this is not much of a problem.

“For example, today a company might be running on premises, with one or two data scientists running from  their laptops with an external GPU,” says Ben David. “I know that if everything works out in a year, then I’ll  have 20 data scientists, and then I’ll need a heavier infrastructure. You want to plan for that. Again, the notion  is that if you know it on day one, two and three, etc., then you can plan ahead for it.”

As data volumes scale and the models become more complex, so does the need for more robust compute;  otherwise, the fact that you have 20x the volumes of data means that your models will take 20x longer,  reducing productivity and agility. Compute needs pipes that can saturate it, so you want to make sure that  you can expand your pipes, (i.e., your network) accordingly.

One frequent and expensive mistake companies make is not planning for the significant data growth over the  course of the project. Amassing 20x more data means a significant increase in storage costs and additional delays, often due to storing more data in cold tiers and moving them back and forth to hot/fast tiers. Those reads and writes are time consuming. Some companies tier some data in the cloud for economies of scale and flexible capacity, which introduces management overhead with multiple name servers and different operational models.

Newer file systems, such as WekaFS, manage the different tiers under a single name server with throughput  that is comparable to local storage. Using a modern file system can dramatically alleviate the cost and  management burden, helping you to keep productivity high as data increases. Most modern file systems are  designed from the ground up to support exabytes of data and AI and ML workloads.

#10 How do we future-proof the project?

Ben David says he sees many companies kicking off AI projects with high hopes for success, but the team has  not taken a holistic view of the entire project, so down the line they run into trouble when it comes to  growth. “We see projects that are starting with some environments that are adequate for one to five data  scientists, but then the environment expands and suddenly they need additional infrastructure,” he says.  “More often than not, you see customers trying to extend their existing infrastructure instead of re-architecting it.”

For example, a data scientist might start to work on a single laptop, and then additional data scientists are  brought in, and suddenly the team needs to work on a network-attached storage appliance. On the other  hand, a project might start in the cloud, but then the team suddenly has 10 to 50 data scientists contributing  to the project, so business leaders determine that it is more cost-effective to buy on-premise equipment for  the computing, network, and storage environment. Having a strategy around how to effectively manage the growth and to scale the project can help future proof a company’s AI project.

Conclusion

Why is “more data” not necessarily better? Knowledge is the key.

It is possible for many AI projects to succeed without having all of the answers or without following the  strategies that were laid out here. Nevertheless, the long-term success of a project must have an AI team  willing to be flexible on infrastructure changes, willing to fine-tune their model, and forward thinking enough  to have a plan to move and store data safely and efficiently. With these plans in place, your chances for  success will go beyond the 15% to 50% rates that many of today’s AI projects experience.

Over the past few weeks we explored Weka’s new insideHPC Guide:

Download the complete 10 Questions to Ask When Starting With AI courtesy of Weka.

Leave a Comment

*

Resource Links: