In this insideHPC Guide, “10 Questions to Ask When Starting With AI,” our friends over at WEKA offer 10 important questions to ask when starting with AI, specifically planning for success beyond the initial stages of a project. Reasons given for these failures include not having a plan ahead of time, not getting executive or business leadership buy-in, or failing to find the proper team to execute the project. Chasing the hot technology trend without having a proper strategy often leads companies down the path of failure.
Artificial intelligence (AI) and machine learning (ML) technologies are disrupting virtually all industries globally—and AI technologies are not just being applied within robotics and vehicle automation. Companies from financial services to retail, from manufacturing to health and life sciences are seeing business improvements through insights generated by AI and ML.
#3 Where will I get my data if I don’t have it already?
If you find yourself needing more data, the next step would be to determine where you can get the data you need. Do you generate it (as in the case with our aforementioned customer questionnaire), do you buy it, or do you rent it?
For example, a medical company embarking on an AI project involving genetics might look at data in a public genome database, but then the researchers might discover that they do not have the data needed for their particular AI model, in which case they might need to conduct their own experiments. Alternately, perhaps they need only a single piece of data in an image versus looking at a complete set of labeled data.
“You want to make sure you know where you will acquire the data at the starting point of the journey, but also with the understanding that this could change along the way,” Ben David says. For example, imagine a farmer that sends drones out in the field to take a lot of multiple pictures and collect data through sensors for crop tracking or soil moisture. Even if the farmer conducts this data discovery for a month, conditions change on a regular basis (weather, crop growth, wildlife, etc.) to the extent that the data collection is truly never finished. Data acquisition is not a one-and-done proposition. “You need to plan ahead for when and where you will get your next batch of data and take the steps to acquire it, often in parallel with your other work” says Ben David.
#4 What is our organizational compute strategy: on-premises, cloud, or hybrid?
A big way to get in trouble with an AI project is to have it run on a computing platform that is not aligned with the organization’s overall digital compute strategy. Knowing current and future plans can help an AI team properly plan for the best way to approach which platform to use for AI or ML models.
“You want to take the most effective way that aligns with your organization’s strategy. It could be that your organization is heavily invested in an on-premises environment with multiple GPUs,” says Ben David. “You might as well leverage that because it would be your fastest path to success.”
AI and ML projects can find success with on-premises, cloud, or hybrid platforms as long as they align with a company’s overall strategy and won’t conflict with changes or modifications down the road. Smaller companies that start with a cloud environment, because it’s faster and less expensive, may find that the costs become larger as it grows, making more sense to move to an on-premises environment.
#5 What is our plan to move and store the data?
Companies often discover they did not have a plan about where they would store and move data as they work to process the AI models. Imagine a company with divisions all over the world, generating petabytes worth of data in multiple locations across different continents. “Do I try to process it where it was created, or do I try to move petabytes of data somehow between sites worldwide?” Ben David asks. “It’s one of the critical things that sometimes is not considered in AI projects.”
Another option is to centralize the data in a single data center, but moving data includes the possible need to compress data or physically ship it instead of transferring it across the cloud, which can become expensive quickly. Moreover, making sure the data is secured is also an issue, as some data cannot be moved due to local or federal regulations. Finally, by the time the data arrives at the site of AI processing, you might find that it’s already obsolete..
“Each organization has a different answer, and they’re all correct,” says Ben David. “But if you do not think about this on day one, then you are more than likely to have a problem.”
Additionally, companies need to consider having a strategy for retaining data for future use. In many cases, a company cannot generate data from experiments over and over again. Data from these experiments needs to be saved, stored, and secured, yet also be available for quick retrieval if needed. As mentioned, this retention set includes raw data that may seem irrelevant now but be needed later as the AI model grows and the ability to analyze evolves. Ben David stresses the idea that raw data should not be deleted nor ignored. “This notion cannot exist in an AI project,” Ben David says.
Over the next few weeks we’ll explore Weka’s new insideHPC Guide:
- Introduction, #1 Have we clearly defined a goal and identified the right questions to get us there?, #2 What data is required to achieve your goal or solve your problem?
- #3 Where will I get my data if I don’t have it already?, #4 What is our organizational compute strategy: on-premises, cloud, or hybrid?, #5 What is our plan to move and store the data?
- #6 How will we remove bias and validate our model’s results?, #7 How often will we fine-tune the models?, #8 How do we deploy a new model?
- #9 How does my infrastructure look on day 3 vs. day 300?, #10 How do we future-proof the project?, Conclusion
Download the complete 10 Questions to Ask When Starting With AI courtesy of Weka.