Editor’s note: Emily James is marketing content writer at marketing research firm FlexMR, London. This is an edited version of a post that originally appeared under the title, “Practical steps to addressing algorithmic bias.”
Any type of bias within market research can result in inaccurate data; it is a researcher’s job to mitigate and minimize the impact of bias when it presents itself, correcting the method to make sure that the results are accurate and reliable. One place you might not expect to find bias is within the supposedly mathematically pure logic of AI and automated machine learning.
AI and machine learning strategies are becoming increasingly prominent within market research, predominantly in data collection and analysis. Text analysis and pattern-spotting are common uses for AI and machine learning techniques, which are often used to handle big data. Collecting and analyzing large quantities of data in order to visualize a clearer model of consumer behavior is one of the most compelling arguments for greater machine learning investment. It frees up researchers to apply insights where needed to help businesses make informed decisions and meet their objectives. Sounds ideal, right?
Unfortunately, using machine learning and AI for tasks traditionally undertaken by researchers is a concept that is still in development, no matter how many organizations have adopted the practice. There are still lots of kinks to work out before AI and machine learning algorithms can be used to their full potential without the need for human supervision; kinks such as algorithmic bias.
Algorithmic bias
Algorithmic bias is where a machine learning or AI algorithm develops the same biases as humans when it comes to collecting, categorizing, producing and interpreting data. The issue arises for a number of reasons but the most prolific reason stems from the initial programming of the algorithm; the data used to train the algorithm is often incomplete or biased to start with, which leads to poorly calibrated models that only produce biased results.
Despite this, machine learning algorithms are integrating themselves into all industry sectors, from banking to health care and retail, influencing us in many ways. Siri, Google Maps and Netflix are prime examples of how machine learning algorithms have become essential in our daily lives. Google Maps’ algorithms improve its service by identifying street names and house numbers from the photos taken by the Street View cars, using them to increase the accuracy of its results. Another Google algorithm reduces travel times by analyzing the speed of traffic through anonymized location data from smartphones and altering its course in accordance to the fastest route.
Algorithms are being used every day by insights professionals to produce insights based on collected data, however, algorithmic bias is a very real danger and can skew results based on the subconscious biases of the programmers and training data sets. In order for it to do its duty without bias, the algorithm needs to be trained with diverse sets of data. In short, we are what create the biases within algorithms with our subconsciously (or maybe consciously in some cases) discriminatory views.
Within the insights industry, algorithmic bias can upend research goals as it produces skewed results or insights due to biased programming. This places the onus of responsibility on both those training machine learning or AI technologies and the researchers using them to spot and minimize where such bias occurs.
Identifying bias
Bias cannot be addressed if no one knows it’s there. Fortunately, researchers have come up with ways to detect if an algorithm is biased. The complexity of algorithms would make it near impossible for anyone without a software engineering degree to alter the code itself. The best way is to see what data is going in and what is coming out to identify any issues. The phrase “garbage in, garbage out” is particularly applicable in this scenario.
The output of a machine learning algorithm reflects the input no matter what goes on in between. If the input is biased toward a certain race, gender or religion, then the results will reflect that. If unexpected results come out, it’s worth investigating whether similar patterns emerge with other data sets to determine whether the results are accurate or whether inherent bias is present.
One example of algorithmic bias comes in the form of recommender algorithms, such as those found on retail/streaming Web sites and social media channels. These algorithms typically work in the background of Web sites, so a lot of people don’t know they interact with these types of algorithms on a daily basis. Recommender algorithms record which adverts and articles people click on and use it to refine their suggestion choices for you. This record can be extremely helpful to researchers looking for a clearer picture of consumer behavior.
Recommender algorithms aren’t free from bias as they can learn a lot about a person just from the generic information people enter into Web sites or social media. Using this information, algorithms have been known to develop prejudices within their advertising against gender, race, disabilities, etc., which compromises the ethical values of market research. But these prejudices are a product of the biased data sets it has initially learned from. If any results that favor one gender or one type of person over everyone else, we should be alerted to the fact that there could potentially be a bias interfering with the system.
Addressing algorithmic bias
With machine learning algorithm technology still at an early stage, taking steps to reduce the unwanted bias are part of a new area of research. Unfortunately, that means there is not yet a cure, so to speak, for algorithmic bias. However, having a human present to make sure every effort to be unbiased is being made is an important step.
An obvious step to reducing bias is to nip it in the bud. As Max Pagels states, “[The algorithm is] not socially biased, it’s just a bunch of numbers.” The data sets used in the initial programming stages need to be constructed so to avoid including social biases within the sets. This will allow for the algorithm to provide diverse outputs without compromising on quality.
However, with algorithms that have already been created and distributed, we need to take different steps to address the situation. One step is to make sure researchers are fully aware of the bias within an algorithm. The use of algorithms should only be considered when they have previously been closely scrutinized for bias. Any bias should be recorded and taken into account when using the algorithm in the future, so that future data can be manually adjusted. This increased awareness in all of the parties involved means that there is a higher chance of the results being used responsibly and informing important decisions more accurately.
Techniques have been devised to mitigate the bias within algorithms though the use of standardized tools such as the Geena Davis Inclusion Quotient (GD-IQ) developed by the Greena Davis Institute on Gender in Media. However, mitigating in this sense means accurately informing algorithms of the bias within society, meaning more accurate insights will be generated based on the reality of the situation. This is “the only tool in existence that has the ability to measure screen and speaking time through the use of automation” according to their article.
This tool accurately measures the screen and speaking time ratio of men to women in films and TV in order to address the gender inequality within the film industry, giving the industry the “power to uncover unconscious gender bias … [to therefore correct and create a media industry that] is more representative of our society [and] fosters a more inclusive industry.” This tool works to better inform algorithms that are analyzing films and TV episodes for any reason, allowing the inequality to be noticed and improve the algorithms’ predictions based on true data.
Adjusting for bias
Algorithmic bias means that algorithms are becoming biased based on inaccurate data that is not representative of the real world. This is a problem for many reasons but in the world of business the main reason is because the results will also be biased and will therefore inaccurately inform business decisions. If researchers are previously aware of the bias within the algorithm then they can adjust for it within their reports, informing others of the bias that the results were based on, and are more able to provide insights that will better inform decisions. Since the development of AI and machine learning algorithms are still within the early stages, so too are the solutions. Researchers must be careful when using such solutions and always keep an open but skeptical eye on results.