10 Questions to Ask When Starting With AI – Part 2

Print Friendly, PDF & Email

In this insideHPC Guide, “10 Questions to Ask When Starting With AI,” our friends over at WEKA offer 10 important questions to ask when starting with AI, specifically planning for success beyond the initial stages of a project. Reasons given for these failures include not having a plan ahead of time, not getting executive or business leadership buy-in, or failing to find the  proper team to execute the project. Chasing the hot technology trend without having a proper strategy often  leads companies down the path of failure.

Artificial intelligence (AI) and machine learning (ML) technologies are disrupting virtually all industries  globally—and AI technologies are not just being applied within robotics and vehicle automation. Companies  from financial services to retail, from manufacturing to health and life sciences are seeing business  improvements through insights generated by AI and ML.

#3 Where will I get my data if I don’t have it already?

If you find yourself needing more data, the next step would be to determine where you can get the data you  need. Do you generate it (as in the case with our aforementioned customer questionnaire), do you buy it, or  do you rent it?

For example, a medical company embarking on an AI project involving genetics might look at data in a public genome database, but then the researchers might discover that they do not have the data needed for their  particular AI model, in which case they might need to conduct their own experiments. Alternately, perhaps  they need only a single piece of data in an image versus looking at a complete set of labeled data.

“You want to make sure you know where you will acquire the data at the starting point of the journey, but  also with the understanding that this could change along the way,” Ben David says. For example, imagine a  farmer that sends drones out in the field to take a lot of multiple pictures and collect data through sensors  for crop tracking or soil moisture. Even if the farmer conducts this data discovery for a month, conditions  change on a regular basis (weather, crop growth, wildlife, etc.) to the extent that the data collection is truly  never finished. Data acquisition is not a one-and-done proposition. “You need to plan ahead for when and where you will get your next batch of data and take the steps to acquire it, often in parallel with your other  work” says Ben David.

#4 What is our organizational compute strategy: on-premises, cloud, or hybrid?

A big way to get in trouble with an AI project is to have it run on a computing platform that is not aligned  with the organization’s overall digital compute strategy. Knowing current and future plans can help an AI  team properly plan for the best way to approach which platform to use for AI or ML models.

“You want to take the most effective way that aligns with your organization’s strategy. It could be that your  organization is heavily invested in an on-premises environment with multiple GPUs,” says Ben David. “You  might as well leverage that because it would be your fastest path to success.”

AI and ML projects can find success with on-premises, cloud, or hybrid platforms as long as they align with a  company’s overall strategy and won’t conflict with changes or modifications down the road. Smaller  companies that start with a cloud environment, because it’s faster and less expensive, may find that the costs  become larger as it grows, making more sense to move to an on-premises environment.

#5 What is our plan to move and store the data?

Companies often discover they did not have a plan about where they would store and move data as they  work to process the AI models. Imagine a company with divisions all over the world, generating petabytes  worth of data in multiple locations across different continents. “Do I try to process it where it was created, or  do I try to move petabytes of data somehow between sites worldwide?” Ben David asks. “It’s one of the  critical things that sometimes is not considered in AI projects.”

Another option is to centralize the data in a single data center, but moving data includes the possible need to compress data or physically ship it instead of transferring it across the cloud, which can become expensive  quickly. Moreover, making sure the data is secured is also an issue, as some data cannot be moved due to  local or federal regulations. Finally, by the time the data arrives at the site of AI processing, you might find  that it’s already obsolete..

“Each organization has a different answer, and they’re all correct,” says Ben David. “But if you do not think  about this on day one, then you are more than likely to have a problem.”

Additionally, companies need to consider having a strategy for retaining data for future use. In many cases, a  company cannot generate data from experiments over and over again. Data from these experiments needs to  be saved, stored, and secured, yet also be available for quick retrieval if needed. As mentioned, this  retention set includes raw data that may seem irrelevant now but be needed later as the AI model grows and the ability to analyze evolves. Ben David stresses the idea that raw data should not be deleted nor ignored.  “This notion cannot exist in an AI project,” Ben David says.

Over the next few weeks we’ll explore Weka’s new insideHPC Guide:

Download the complete 10 Questions to Ask When Starting With AI courtesy of Weka.