In this insideHPC Guide, “10 Questions to Ask When Starting With AI,” our friends over at WEKA offer 10 important questions to ask when starting with AI, specifically planning for success beyond the initial stages of a project. Reasons given for these failures include not having a plan ahead of time, not getting executive or business leadership buy-in, or failing to find the proper team to execute the project. Chasing the hot technology trend without having a proper strategy often leads companies down the path of failure.
Artificial intelligence (AI) and machine learning (ML) technologies are disrupting virtually all industries globally—and AI technologies are not just being applied within robotics and vehicle automation. Companies from financial services to retail, from manufacturing to health and life sciences are seeing business improvements through insights generated by AI and ML.
#6 How will we remove bias and validate our model’s results?
After data is collected and in place, make sure you know how to validate the results that the AI or ML model is generating. One such way is to run it against a known data set and look at the results to make sure that you have a high level of accuracy on the expected results.
For example, if your AI algorithm is identifying a batch of photographs and determining which include images of apples and which include images of oranges, will your model accurately identify the correct fruit? Ben David says humans can often validate answers on a simple level, but this ability doesn’t scale well when the data set includes hundreds or thousands of images. In this case, AI experts often run validations through a simulator, which can verify the AI models on a larger scale.
Furthermore, validating the results is an important step in determining whether the AI has any inherent biases built into the model. One well-known example was when Amazon discovered that a resume-screening application was not rating candidates for software developer jobs and other technical positions in a gender- neutral way. Because the models were trained to choose applicants by observing patterns in resumes submitted over a 10-year pattern, most of the resumes were from men (who, at the time, dominated the field).
When evaluating your AI models, be sure to have a strategy for spotting and eliminating bias, or the results you end up with could be skewed and affect the project’s credibility.
#7 How often will we fine-tune the models?
Because much of AI and ML is based in software, developers often adopt a “set it and forget it” approach, which can be disastrous for this technology. Fine-tuning not only involves being ready to change the model regularly, but also understanding how practitioners can change different variables within the model to achieve different results.
Some AI models, for example, will provide results based on your data but will also explain how they achieved those results. Others, however, simply spit out results and leave it up to the data scientists to figure out why, causing what many data scientists refer to as “explainable AI.” “Any AI project is always a work in progress,” says Ben David. Creating and executing on a model that can provide good reasons for its decisions is an important step in building trust in the model.
Fine-tuning (and deciding whether or not to deploy a new model, discussed in Question #8) can often be a result of discovering that you have “bad data.” In general, bad data is data that has not been “cleaned up,” or it contains missing fields, duplications, or the data type is not in the correct format, such as dates written in text instead of the date format.
But even clean data can be considered bad if it is too specific or presents biases, such as problems generated in facial recognition or the gender-bias that was discovered in Amazon’s resume scanning application. The data may have appeared to be good initially but turned out to be bad after the algorithm kept eliminating female resumes because the model didn’t account for fewer female resumes in the historical data. This seeming error in the mathematical algorithm actually indicated an error in the data set: the historical data was not broad enough.
The best way to determine whether your data is good or bad is to first make sure that it is clean, and then check that it is broad enough to produce unbiased results.
#8 How do we deploy a new model?
With a model that is fine-tuned on a regular basis, companies then need to have a strategy around the possibility of deploying new AI models that can better answer the original questions or the possibility of generating new questions based on the results they are seeing.
For example, at some point data scientists may decide to move to a different neural network for their AI model or algorithm, which might require that something new is created versus fine-tuning or modifying an older model. Many of these decisions are dependent on the specific algorithms or goals that companies are aiming to achieve, but an AI team’s radar screen should include the question of how to deploy a new model should the need arise at a later date.
Some may think that acquiring more data is a way to fine-tune or create better outcomes, but this can be a trap for many companies. If the data is not good to start with, adding more of it will not suddenly solve the problem. When people suggest that getting more data will help, they are often implying to need to acquire a broader data set that meets high quality standards.
In a 2018 article for Harvard Business Review, Thomas C. Redman, president of Data Quality Solutions, said good data must be right in two ways:
- It must be correct, properly labeled, de-duplicated, etc.
- It must be right for you.
Earlier this year, Redman also spoke about how companies often waste critical resources in dealing with bad data in an MIT Sloan Management Review article. “Bad data, in turn, breeds mistrust in the data, further slowing efforts to create advantage,” he said.
Over the next few weeks we’ll explore Weka’s new insideHPC Guide:
- Introduction, #1 Have we clearly defined a goal and identified the right questions to get us there?, #2 What data is required to achieve your goal or solve your problem?
- #3 Where will I get my data if I don’t have it already?, #4 What is our organizational compute strategy: on-premises, cloud, or hybrid?, #5 What is our plan to move and store the data?
- #6 How will we remove bias and validate our model’s results?, #7 How often will we fine-tune the models?, #8 How do we deploy a new model?
- #9 How does my infrastructure look on day 3 vs. day 300?, #10 How do we future-proof the project?, Conclusion
Download the complete 10 Questions to Ask When Starting With AI courtesy of Weka.