Introduction

Our Koios platform relies on a powerful artificial intelligence (AI) model that has been honed and improved over the last two years. Relying on cutting-edge research into speech representation, speaker recognition, natural language processing, and, of course, psychology, we created the most powerful engine on the market to predict someone’s personality from their voice.

As you surely know, our AI model leverages audio recordings to predict personality scores. While the underlying architecture is complex, it primarily works by extracting and processing two types of signals from the audio: linguistic and acoustic signals (see Figure 1).

Figure 1: Magic behind the Koios algorithm

Decoding Linguistic Signals: What is Being Said?

The first step of our model's processing journey begins with linguistic signals. These are essentially the words and phrases used by the speaker in the audio recording. The process starts with a critical cleaning and preprocessing stage, where we eliminate any background noise, pauses, or irrelevant sounds. This way, we ensure the purity and accuracy of the data, making it free from distortions or errors that could impact the subsequent analysis.

The cleaned linguistic signals are then fed into an automatic speech recognition (ASR) layer. ASR technology is designed to convert spoken language into written text. It's a crucial component of our system as it allows us to analyse the spoken words on a granular level, examining the language usage and context in detail.

Upon the successful transcription of the audio into text, the next stage involves the extraction of linguistic and contextual features. This includes a broad spectrum of attributes such as the choice of words, the complexity of sentences, the use of active or passive voice, the presence of specific terminologies or phrases, and even the context inferred from the speech.

These features help us understand the speaker's language patterns, their semantic preferences, and their contextual usage. By analysing these elements, we can draw correlations between linguistic patterns and specific personality traits.

The culmination of this process is the creation of a linguistic dataset. This dataset contains all the linguistic features extracted from the audio, serving as one of the key input sources for our final models that predict personality scores.

The Art of Listening: How is it Being Said?

While what is being said is important, how it's being said carries equal weight. The way people speak — their tone, pitch, volume, and speed — provides a wealth of information about their personalities. That's where our acoustic signal processing comes into play.

Firstly, we convert the raw audio recording into a unified format by normalising it. This normalisation process helps us maintain uniformity, ensuring that our analysis is not biased by varying sound quality or volume levels in different recordings.

Next, like linguistic signals, acoustic signals undergo thorough cleaning and preprocessing. Here we remove any unwanted sounds like background noise or distortions that might interfere with the extraction of our desired features. This step is crucial for maintaining the quality and accuracy of the signals that we will subsequently analyse.

Once cleaned, we then extract voice biomarkers from the audio. These biomarkers are characteristics of the speaker's voice that have been scientifically validated to associate with specific personality traits. For instance, the pace at which someone speaks, the range of their vocal pitch, their speech rhythm, and even their momentary pauses could reveal telling insights about their personality.

To extend our analysis beyond these biomarkers, we also feed the audio recording through a set of pre-trained audio models. These models are designed to extract a wide array of additional acoustical features, enriching our understanding of the speaker's vocal profile.

All of these elements collectively form an acoustic dataset, a treasure trove of voice-based insights that complement the linguistic dataset. Together, these datasets provide a comprehensive and nuanced understanding of the speaker, which feeds into our final models that predict personality scores.

The Symphony of Signals: Crafting Personality Scores

After the diligent processing and extraction of both linguistic and acoustic signals, the resultant datasets serve as the input to our final models. These models represent the culmination of our unique approach to personality prediction, weaving together the linguistic and acoustic threads into a comprehensive profile of the speaker's personality.

Our models are designed using advanced machine learning algorithms tailored to handle the richness and complexity of linguistic and acoustic data. They sift through this information, identifying and learning from patterns that may be too subtle or complex for the human eye (or ear) to detect.

The final step is the prediction of individual personality scores. Each score is the result of an intricate dance between various linguistic and acoustic signals. For instance, the frequency of certain words or phrases (linguistic features) might interact with the volume or pitch at which they are delivered (acoustic features), painting a detailed picture of different personality traits.

In essence, our models don't just tally scores. They capture the nuance and depth of human personality, echoing the richness of human communication. Each score is a testament to the unique blend of what is said and how it is said, offering an unprecedented level of insight and accuracy into the tapestry of human personality.

Big 5 Personality Traits

The Big Five Model, also known as the Five Factor Model, is a widely accepted framework for measuring personality amongst psychologists today.₁‍
These traits can be broken down into the following dimensions:

Openness to experience, Conscientiousness, Extraversion, Agreeableness, Neuroticism (or emotional stability).₂
These are considered fundamental dimensions of personality and encompass a broad range of human behaviours and characteristics. Each trait represents a spectrum, meaning that individuals are ranked on a scale for each dimension.

Openness to Experience:

It represents a person's willingness to try new things and think creatively.
- High: Imaginative, curious, open to new experiences, embraces diversity, appreciates art and innovation.
- Low: Grounded, practical, reliable, prefer routine, focused on concrete tasks and traditional approaches.

Conscientiousness:

Represents a person's tendency towards organisation and responsibility.
- High: Diligent, disciplined, dependable, achieving goals, good time management.
- Low: Flexible, spontaneous, open-minded, less constrained by rules, adaptable to change.

Extraversion:

Reflects how outgoing and sociable a person is.
- High: Energetic, sociable, assertive, enjoy social interactions, natural leader.
- Low: Reserved, introspective, independent, comfortable with solitude, good listener.

Agreeableness:

Reflects how compassionate, considerate, and cooperative a person is.
- High: Empathetic, kind, trusting, supportive, value harmonious relationships.
- Low: Assertive, independent, direct, competitive, stands firm in their beliefs.

Neuroticism:

Indicates emotional sensitivity, and relates to how a person perceives and reacts to different situations.
- High: Emotionally aware, attentive to details, cautious, thorough, strives for perfection, more likely to experience stress and anxiety.
- Low: Emotionally stable, resilient, optimistic, handles stress well, maintains composure.

By evaluating people's positions on each of these traits, we can effectively assess the unique variations in their personalities.
‍
For more information see here

References:
‍
1. Widiger, T. A., & Crego, C. (2019). The Five Factor Model of personality structure: an update. World psychiatry: official journal of the World Psychiatric Association (WPA), 18(3), 271–272. https://doi.org/10.1002/wps.20658

2. McCrae, R. R., & John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of personality, 60(2), 175–215. https://doi.org/10.1111/j.1467-6494.1992.tb00970.x

Our personality types

The Koios personality assessment method is based on a carefully validated psychometric instrument for measuring personality traits and data sorting methods, which when tailored to measure acoustic data revealed that people tend to fall into 25 distinct personality clusters.
‍

How did we train our model and calculate scores?

To begin our model training process, we collected data on several thousand respondents, whom we asked to complete a questionnaire composed of items from the International Personality Item Pool (IPIP). The IPIP is a validated and tested psychometric instrument for measuring personality traits (see here), based on the Five Factor Model of personality, also known as the Big Five Personality Traits. Along with the questionnaire, we asked our participants to submit a short voice recording. While we provided cues regarding what they could talk about, respondents were free to say whatever they felt comfortable with.

To calculate the final scores, we used a technique called confirmatory factor analysis (CFA) on the questionnaire data (see Brown, 2015, or Roos and Bauldry, 2021). This linked the questionnaire data to the personality traits we wanted to measure. By comparing and combining each individual’s questionnaire data with the audio signals derived from their voice recording, we could train our audio model to find patterns in a person’s voice (voice biomarkers) that accurately represent their personality traits.
‍
While we'd like to share specific trait scores or percentiles, we found that in the majority of use cases, these were not the ideal format for our end users. One solution is to use the approach used by the Myers–Briggs Type Indicator – a technique that binarises traits on the median. This means that individual traits are categorised into two groups based on whether they fall above or below the median value. This approach, however, is not well equipped to observe and delineate more detailed differences between individual scores. For example, if a person’s score for a specific trait lies barely above the median or extremely high above the median, they still fall into the same category. For our standards, this approach is not enough.

Our compromise

An appealing compromise we found was to communicate both the personality trait scores, while also developing personality cluster types that are reflective of each of the subgroups that empirically occur most often in the population. With this method, each new user would be assigned to one specific personality cluster with a clear interpretation of the personality traits that define that specific personality group.

How did we find 25 Clusters?

As depicted in the figure above, the BIC reaches the minimum between 25 and 26 clusters. We decided to use 25 clusters, as the last additional cluster accounted for less than 1% of the total population and would therefore be used very rarely to warrant its presence.

Proportion of Respondents in each Cluster

What do our clusters look like?

Interpretations of each cluster are driven by the distribution of the Big Five personality traits (Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness). In the figure below, we display the mean and 20th to 80th percentile range of each personality trait in that respective cluster.

It is important to note that using GMM does not constrain our model in predicting the correct personality trait score for any user. It is simply a vehicle that empowers users to better understand the results of their personality analysis.

How do we personalise our reports?

When someone uses our service, we predict their scores and use the GMM model to assign that individual to a specific cluster. We use a pre-trained LLM (large language model) to personalise all the insights and summaries for each user to provide a unique personality report.
‍
Our personalisation process plays an important role in our ultimate personality report for every individual. In addition to extracting and processing acoustic signals, we also do so with the linguistic signals we gain from each individual’s voice recording. We decode the linguistic signals by analysing the words and phrases used by the speaker in their recording. We are therefore able to create completely unique reports for every user of our system as the details they share in their voice recording are interwoven into our pre-existing personality descriptions. This can demonstrate how the unique experiences and characteristics of a person’s life have come together to display how and why they fit into a particular personality cluster.
‍
Thus, each personality assessment is a combination of the intricate linguistic and acoustic signals that we extract from each person’s voice – a unique blend of what is said and how it is said.

‍References
‍Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research. Guilford Press, second edition.
‍
Gerlach, M., Farb, B., Revelle, W. et al. (2018). A robust data-driven approach identifies four personality types across four large data sets. Nature Human Behaviour 2: 735–742,
https://doi.org/10.1038/s41562-018-0419-z
‍
Roos, J. M. and Bauldry, S. (2021). Confirmatory Factor Analysis. Sage, Quantitative Applications in the Social Sciences series.