Sign up for our newsletter and get the latest big data news and analysis.
Daily
Weekly

10 Questions to Ask When Starting With AI – Part 3

In this insideHPC Guide, “10 Questions to Ask When Starting With AI,” our friends over at WEKA offer 10 important questions to ask when starting with AI, specifically planning for success beyond the initial stages of a project. Reasons given for these failures include not having a plan ahead of time, not getting executive or business leadership buy-in, or failing to find the  proper team to execute the project. Chasing the hot technology trend without having a proper strategy often  leads companies down the path of failure.

Artificial intelligence (AI) and machine learning (ML) technologies are disrupting virtually all industries  globally—and AI technologies are not just being applied within robotics and vehicle automation. Companies  from financial services to retail, from manufacturing to health and life sciences are seeing business  improvements through insights generated by AI and ML.

#6 How will we remove bias and validate our model’s results?

After data is collected and in place, make sure you know how to validate the results that the AI or ML model  is generating. One such way is to run it against a known data set and look at the results to make sure that  you have a high level of accuracy on the expected results.

For example, if your AI algorithm is identifying a batch of photographs and determining which include  images of apples and which include images of oranges, will your model accurately identify the correct fruit?  Ben David says humans can often validate answers on a simple level, but this ability doesn’t scale well when  the data set includes hundreds or thousands of images. In this case, AI experts often run validations through  a simulator, which can verify the AI models on a larger scale.

Furthermore, validating the results is an important step in determining whether the AI has any inherent biases built into the model. One well-known example was when Amazon discovered that a resume-screening  application was not rating candidates for software developer jobs and other technical positions in a gender- neutral way. Because the models were trained to choose applicants by observing patterns in resumes   submitted over a 10-year pattern, most of the resumes were from men (who, at the time, dominated the field).

When evaluating your AI models, be sure to have a strategy for spotting and eliminating bias, or the results  you end up with could be skewed and affect the project’s credibility.

#7 How often will we fine-tune the models?

Because much of AI and ML is based in software, developers often adopt a “set it and forget it” approach,  which can be disastrous for this technology. Fine-tuning not only involves being ready to change the model  regularly, but also understanding how practitioners can change different variables within the model to  achieve different results.

Some AI models, for example, will provide results based on your data but will also explain how they achieved  those results. Others, however, simply spit out results and leave it up to the data scientists to figure out why,  causing what many data scientists refer to as “explainable AI.” “Any AI project is always a work in progress,”  says Ben David. Creating and executing on a model that can provide good reasons for its decisions is an  important step in building trust in the model.

Fine-tuning (and deciding whether or not to deploy a new model, discussed in Question #8) can often be a result of discovering that you have “bad data.” In general, bad data is data that has not been “cleaned up,” or it contains missing fields, duplications, or the data type is not in the correct format, such as dates written in  text instead of the date format.

But even clean data can be considered bad if it is too specific or presents biases, such as problems generated  in facial recognition or the gender-bias that was discovered in Amazon’s resume scanning application. The data may have appeared to be good initially but turned out to be bad after the algorithm kept eliminating female resumes because the model didn’t account for fewer female resumes in the historical data. This seeming error in the mathematical algorithm actually indicated an error in the data set: the  historical data was not broad enough.

The best way to determine whether your data is good or bad is to first make sure that it is clean, and then  check that it is broad enough to produce unbiased results.

#8 How do we deploy a new model?

With a model that is fine-tuned on a regular basis, companies then need to have a strategy around the  possibility of deploying new AI models that can better answer the original questions or the possibility of  generating new questions based on the results they are seeing.

For example, at some point data scientists may decide to move to a different neural network for their AI  model or algorithm, which might require that something new is created versus fine-tuning or modifying an  older model. Many of these decisions are dependent on the specific algorithms or goals that companies are  aiming to achieve, but an AI team’s radar screen should include the question of how to deploy a new model  should the need arise at a later date.

Some may think that acquiring more data is a way to fine-tune or create better outcomes, but this can be a  trap for many companies. If the data is not good to start with, adding more of it will not suddenly solve the  problem. When people suggest that getting more data will help, they are often implying to need to acquire a broader data set that meets high quality standards.

In a 2018 article for Harvard Business Review, Thomas C. Redman, president of Data Quality Solutions, said  good data must be right in two ways:

  1. It must be correct, properly labeled, de-duplicated, etc.
  2. It must be right for you.

Earlier this year, Redman also spoke about how companies often waste critical resources in dealing with bad data in an MIT Sloan Management Review article. “Bad data, in turn, breeds mistrust in the data, further slowing efforts to create advantage,” he said.

Over the next few weeks we’ll explore Weka’s new insideHPC Guide:

Download the complete 10 Questions to Ask When Starting With AI courtesy of Weka.

Leave a Comment

*

Resource Links: