Myers-Briggs Personality Descriptions? That’s Kinda Weird for an NLP Project, Isn’t it?
Yeah, maybe it is a bit unexpected. It was one of those days where I was far down the reddit rabbit hole, as I discovered how much people talk about Myers-Briggs personalities. (Yes, there is an entire subreddit dedicated to INTJs). I had taken a test long ago (I’m an INTJ), but had since forgotten the descriptions for the other types. After a bit of googling, I found the 16 canon descriptions from the Myers-Brigs homepage. I then jumped to the INTJ description:
“Have original minds and great drive for implementing their ideas and achieving their goals. Quickly see patterns in external events and develop long-range explanatory perspectives. When committed, organize a job and carry it through. Skeptical and independent, have high standards of competence and performance - for themselves and others.”
Interested if there was any overlap in the other descriptions, I did a quick control-F to find ‘drive’. Nope, not in any other description. How about ‘high standards’? Nope, nada. Ok… I said to myself, the INTJ description is at least unique to the phrases ‘high standards’ and ‘drive’! Then I stopped to think.
I realized there could be words that were unique to each personality type. Surely everyone would like to get a quick overview of how their personality stands out from the others.
Sure, I could go through each description manually with the same process, cntl-F’ing every single word. But that would be extremely painstaking and boring. Nearly the same time, I realized I could write a quick algorithm with some NLP aspects that would be great to put here on NLP Champs! I came out with some neat results, and it opened the door for a further tool which is in the pipeline.
Getting Started
I did a quick copy and paste of all the descriptions from the Myers-Brigs homepage and put them into a python dictionary:
# declare dict otherwise Python will rage at us
dMyersBriggs = {}
# the dict values - directly from http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/the-16-mbti-types.htm?bhcp=1
dMyersBriggs['ISTJ'] = 'Quiet, serious, earn success by thoroughness and dependability. Practical, matter-of-fact, realistic, and responsible. Decide logically what should be done and work toward it steadily, regardless of distractions. Take pleasure in making everything orderly and organized - their work, their home, their life. Value traditions and loyalty.'
dMyersBriggs['ISFJ'] = 'Quiet, friendly, responsible, and conscientious. Committed and steady in meeting their obligations. Thorough, painstaking, and accurate. Loyal, considerate, notice and remember specifics about people who are important to them, concerned with how others feel. Strive to create an orderly and harmonious environment at work and at home.'
dMyersBriggs['INFJ'] = 'Seek meaning and connection in ideas, relationships, and material possessions. Want to understand what motivates people and are insightful about others. Conscientious and committed to their firm values. Develop a clear vision about how best to serve the common good. Organized and decisive in implementing their vision.'
dMyersBriggs['INTJ'] = 'Have original minds and great drive for implementing their ideas and achieving their goals. Quickly see patterns in external events and develop long-range explanatory perspectives. When committed, organize a job and carry it through. Skeptical and independent, have high standards of competence and performance - for themselves and others.'
dMyersBriggs['ISTP'] = 'Tolerant and flexible, quiet observers until a problem appears, then act quickly to find workable solutions. Analyze what makes things work and readily get through large amounts of data to isolate the core of practical problems. Interested in cause and effect, organize facts using logical principles, value efficiency.'
dMyersBriggs['ISFP'] = 'Quiet, friendly, sensitive, and kind. Enjoy the present moment, whats going on around them. Like to have their own space and to work within their own time frame. Loyal and committed to their values and to people who are important to them. Dislike disagreements and conflicts, do not force their opinions or values on others.'
dMyersBriggs['INFP'] = 'Idealistic, loyal to their values and to people who are important to them. Want an external life that is congruent with their values. Curious, quick to see possibilities, can be catalysts for implementing ideas. Seek to understand people and to help them fulfill their potential. Adaptable, flexible, and accepting unless a value is threatened.'
dMyersBriggs['INTP'] = 'Seek to develop logical explanations for everything that interests them. Theoretical and abstract, interested more in ideas than in social interaction. Quiet, contained, flexible, and adaptable. Have unusual ability to focus in depth to solve problems in their area of interest. Skeptical, sometimes critical, always analytical.'
dMyersBriggs['ESTP'] = 'Flexible and tolerant, they take a pragmatic approach focused on immediate results. Theories and conceptual explanations bore them - they want to act energetically to solve the problem. Focus on the here-and-now, spontaneous, enjoy each moment that they can be active with others. Enjoy material comforts and style. Learn best through doing.'
dMyersBriggs['ESFP'] = 'Outgoing, friendly, and accepting. Exuberant lovers of life, people, and material comforts. Enjoy working with others to make things happen. Bring common sense and a realistic approach to their work, and make work fun. Flexible and spontaneous, adapt readily to new people and environments. Learn best by trying a new skill with other people.'
dMyersBriggs['ENFP'] = 'Warmly enthusiastic and imaginative. See life as full of possibilities. Make connections between events and information very quickly, and confidently proceed based on the patterns they see. Want a lot of affirmation from others, and readily give appreciation and support. Spontaneous and flexible, often rely on their ability to improvise and their verbal fluency.'
dMyersBriggs['ENTP'] = 'Quick, ingenious, stimulating, alert, and outspoken. Resourceful in solving new and challenging problems. Adept at generating conceptual possibilities and then analyzing them strategically. Good at reading other people. Bored by routine, will seldom do the same thing the same way, apt to turn to one new interest after another.'
dMyersBriggs['ESTJ'] = 'Practical, realistic, matter-of-fact. Decisive, quickly move to implement decisions. Organize projects and people to get things done, focus on getting results in the most efficient way possible. Take care of routine details. Have a clear set of logical standards, systematically follow them and want others to also. Forceful in implementing their plans.'
dMyersBriggs['ESFJ'] = 'Warmhearted, conscientious, and cooperative. Want harmony in their environment, work with determination to establish it. Like to work with others to complete tasks accurately and on time. Loyal, follow through even in small matters. Notice what others need in their day-by-day lives and try to provide it. Want to be appreciated for who they are and for what they contribute.'
dMyersBriggs['ENFJ'] = 'Warm, empathetic, responsive, and responsible. Highly attuned to the emotions, needs, and motivations of others. Find potential in everyone, want to help others fulfill their potential. May act as catalysts for individual and group growth. Loyal, responsive to praise and criticism. Sociable, facilitate others in a group, and provide inspiring leadership.'
dMyersBriggs['ENTJ'] = 'Frank, decisive, assume leadership readily. Quickly see illogical and inefficient procedures and policies, develop and implement comprehensive systems to solve organizational problems. Enjoy long-term planning and goal setting. Usually well informed, well read, enjoy expanding their knowledge and passing it on to others. Forceful in presenting their ideas.'
Okay, now we need to clean up the text before we can cleanly find unique words. Let's get started.
So, it's clear we need to loop over all the personalities in our defined dictionary and do stuff (basically cleaning) to them, like so:
for sPersonality, sDescription in dMyersBriggs.iteritems():
I put each description to only lowercase:
sDescription = sDescription.lower()
Then I wanted to remove any punctuation or not alphabetical characters in the descriptions. I noticed the authors used the ol' ' - ' in the middle of sentences sometimes... I wanted to get rid of those since I had concerns it would affect the stop word removal process.
sDescription = sDescription.replace(".", " ")
sDescription = sDescription.replace(" - ", " ")
sDescription = sDescription.replace(",", " ")
Then we want to break the text into an array and remove the stop words from that array. We use the nltk stopwords words from nltk for this:
# convert string to list of words
lDescription = sDescription.split()
# remove stop words
lDescription = [sWord for sWord in lDescription if sWord not in stopwords.words('english')]
We finally write the newly generated array to a dCleanedMyersBriggs dict:
dCleanedMyersBriggs[sPersonality] = list(set(lDescription)) # unique only
All together, our tasty loop looks like the following:
for sPersonality, sDescription in dMyersBriggs.iteritems():
# convert to lower
sDescription = sDescription.lower()
# remove '.', ' - ', and '.'
sDescription = sDescription.replace(".", " ")
sDescription = sDescription.replace(" - ", " ")
sDescription = sDescription.replace(",", " ")
# convert string to list of words
lDescription = sDescription.split()
# remove stop words
lDescription = [sWord for sWord in lDescription if sWord not in stopwords.words('english')]
# finally throw the tastiness into the 'cleaned' dict - converted to set for unique only, and back to a list for the list type which will prove useful later
dCleanedMyersBriggs[sPersonality] = list(set(lDescription)) # unique only
Great. At this point we have all our personalities as a clean array of words without any stop words. For INTJ the dictionary looks like this:
In [1]: dCleanedMyersBriggs['INTJ']
Out[1]:
['organize',
'competence',
'high',
'see',
'committed',
'carry',
'standards',
'develop',
'perspectives',
'explanatory',
'ideas',
'quickly',
'performance',
'achieving',
'independent',
'implementing',
'job',
'external',
'others',
'skeptical',
'great',
'long-range',
'drive',
'minds',
'events',
'patterns',
'original',
'goals']
Looking pretty good... let's get to finding the unique words!
Unique Word Algorithm
Now it got a bit difficult - I had to think about the following process for a little bit. Again, I am interested in words that show up only once and only once across ALL of the descriptions. The algorithm needs to still work if a word appears 2 or 3 times in a single personality, but none of the others.
I ended up simply creating a double loop. I was stuck for a little while brainstorming a possible solution using sets and intersect() with Python, but just wasn't in the mood to dive into something totally new - I was too interested in the results (note: before today I had zero experience with sets in Python! I know, sad...). It's clear the first loop will involve both the keys and values of the dict:
for sPersonality, lDescription in dCleanedMyersBriggs.iteritems():
Then we need to loop at each word in the given description of the loop:
for sWord in lDescription:
Now it gets a bit tricky. We need to test if any given word in one description is in any of the others, so I defined a loop nearly identical to dictionary loop above, but with variable names including "inner":
for sInnerPersonality, lInnerDescription in dCleanedMyersBriggs.iteritems():
In this innermost loop, we don't need to count the number of occurrences of the word in its own description because obviously that word will occur in its own description, so we just continue:
if sPersonality == sInnerPersonality: #
continue # don't search itself
Otherwise, the code will continue to check if the word occurs in any other of the description, and if it occurs even once, we set a boolean flag to true:
if sWord in lInnerDescription:
bFoundInOtherDescription = True
Then, outside of this loop, meaning we have searched all the personality descriptions, we make our check to see if the word was found at all, and if not, append each word that is unique to its own personality to an array, and then, finally, put each list for each personality into a new dictionary called dUniqueMyersBriggs.
if bFoundInOtherDescription == False: # it is only unique if there were no other matches in any other of the descriptions
lWords.append(sWord)
So, the final 'unique-word search algorithm' looks like this:
print "Words unique to personality:"
for sPersonality, lDescription in dCleanedMyersBriggs.iteritems():
# loop at each word in this personality's description:
lWords = [] # build an array of unique words for each personality
for sWord in lDescription:
bFoundInOtherDescription = False # set match count of this word to 0
# loop over all other personality descriptions
for sInnerPersonality, lInnerDescription in dCleanedMyersBriggs.iteritems():
if sPersonality == sInnerPersonality: #
continue # don't search itself
if sWord in lInnerDescription:
bFoundInOtherDescription = True
if bFoundInOtherDescription == False: # it is only unique if there were no other matches in any other of the descriptions
lWords.append(sWord)
print sPersonality + " " + str(len(lWords)) # word counts unique for description
dUniqueMyersBriggs[sPersonality] = lWords
If you check out dUniqueMyersBriggs, it's still fairly long. So, I did one last thing to reduce the words to what I called "descriptive" words. I tagged the text:
lTokenText = nltk.word_tokenize(" ".join(lDescription)) # tokenize text
and then I tagged it with the part of speech (POS) tagger from NLTK:
lTaggedText = nltk.pos_tag(lTokenText) # tag tokens
To me, most intersting parts of speech in the personality desciptions would be all the adverbs (part of speech tags RB, RBR, RBS) and all the adjectives (part of speech tags JJ, JJR, JJS). You can view a list of the final standards released for parts speech tags in this paper.
if tTaggedText[1] == "RB" or tTaggedText[1] == "RBR" or tTaggedText[1] == "JJR" or tTaggedText[1] == "JJ" or tTaggedText[1] == "JJR" or tTaggedText[1] == "JJS":
lWords.append(tTaggedText[0])
Again, I generate yet another new dict for these 'descriptive' unique words:
dDescriptiveMyersBriggs[sPersonality] = lWords
Resulting in a total loop structure that looks like this:
print "'Descriptive' words for each personality:"
for sPersonality, lDescription in dUniqueMyersBriggs.iteritems():
lWords = []
lTokenText = nltk.word_tokenize(" ".join(lDescription)) # tokenize text
lTaggedText = nltk.pos_tag(lTokenText) # tag tokens
for tTaggedText in lTaggedText:
if tTaggedText[1] == "RB" or tTaggedText[1] == "RBR" or tTaggedText[1] == "JJR" or tTaggedText[1] == "JJ" or tTaggedText[1] == "JJR" or tTaggedText[1] == "JJS":
lWords.append(tTaggedText[0])
dDescriptiveMyersBriggs[sPersonality] = lWords
print sPersonality + " " + str(len(lWords)) # word counts for 'descriptive' words for description
Closing Notes, Comments, and Stuff
At some point I began thinking that this analysis had gotten so technical to the point that it would actually bring out the shortcomings of the description writers - i.e., would it bring out any human biases to particular words that they wrote repeatedly in many of the descriptions? But I was actually surprised at the results. Each description is decently unique against the others, most of the descriptions have a unique word count in the range of 20-30% the full description word count.
Word Counts - Total
ENFJ | 51 |
ESFP | 54 |
INFJ | 47 |
ESTJ | 52 |
ISTJ | 46 |
ENTJ | 48 |
ISFP | 55 |
INTJ | 49 |
ISTP | 48 |
ENTP | 49 |
ISFJ | 47 |
INTP | 46 |
ESFJ | 60 |
ESTP | 52 |
ENFP | 54 |
INFP | 54 |
Word Counts - Unique
ENFJ | 18 |
ESFP | 12 |
INFP | 5 |
INFJ | 9 |
ISTJ | 15 |
ENTJ | 21 |
ESTJ | 12 |
INTJ | 15 |
ISTP | 17 |
ENTP | 19 |
ISFJ | 14 |
ISFP | 14 |
ESFJ | 17 |
ESTP | 9 |
ENFP | 19 |
INTP | 13 |
Word Counts - 'Descriptive' Unique
ENFJ | 6 |
ESFP | 1 |
INFJ | 2 |
ISTJ | 5 |
ENTJ | 8 |
ESTJ | 3 |
INTJ | 6 |
ISTP | 2 |
ENTP | 4 |
ISFJ | 5 |
ISFP | 3 |
ESFJ | 7 |
ESTP | 4 |
INTP | 9 |
ENFP | 8 |
INFP | 2 |
You can verify these charts in your iPython terminal or by modifying the source code quickly enough (see the commented code in the final results) in some cases this was as high as nearly 50% unique words per description, for example ENTJ. So I would give a hats off to the writers of these descriptions - they indeed target a specific audience.
Conclusions and Final Results - What you've all been waiting for!
So, the algorithm ended up working just fine. I tested the results (below) against the actual descriptions from the website and for the words I tested, it checks out.
One drawback I did notice is that I should perhaps extend the algorithm to not just single words, but shared n-grams. I noticed this because in INTJ 'high' was found indeed as a unique word, but 'high standards' was not. A bigram algorithm would have been able to discern that the phrase 'high standards' was unique across all the descriptions. However, there would be an upward limit to the length of n-gram to match. As the n-gram length increases, the phrase would become more and more unique - perhaps returning results that are, in fact, because of their meaning, not truly unique to just a single personality description.
Look for another post in the future similar to this - I think I may refactor this process to create an abstracted function that could be run for any n sets of text - which can be fed to said algorithm to achieve a similar output: a list of unique words that appear only in their parent text. A long-term goal would be generating n Venn Diagram for the n samples of text, where the words that are both shared an unique to each sample are clearly shown. That could even be used as a nice tool for writers who would like to quickly check their writing for repeat language over and over again in specific paragraphs, chapters, etc. All with good time.
With that said, check out the unique words my algorithm pulled out:
Unique Words
ENFJ | criticism, everyone, highly, individual, responsive, empathetic, needs, group, emotions, may, inspiring, warm, sociable, motivations, attuned, growth, praise, facilitate |
ESFP | exuberant, environments, bring, sense, happen, skill, lovers, working, adapt, outgoing, trying, fun |
INFP | curious, unless, idealistic, threatened, congruent |
INFJ | relationships, possessions, insightful, firm, serve, meaning, motivates, connection, vision |
ISTJ | distractions, regardless, traditions, logically, thoroughness, dependability, loyalty, decide, pleasure, earn, success, serious, steadily, making, toward |
ENTJ | expanding, planning, organizational, illogical, informed, setting, systems, comprehensive, knowledge, inefficient, read, frank, usually, procedures, passing, goal, assume, well, presenting, policies, long-term |
ESTJ | move, set, decisions, efficient, also, details, plans, possible, projects, care, getting, systematically |
INTJ | competence, high, carry, perspectives, explanatory, performance, achieving, independent, job, great, long-range, drive, minds, original, goals |
ISTP | principles, solutions, facts, amounts, isolate, workable, analyze, cause, core, effect, observers, efficiency, using, data, appears, large, makes |
ENTP | ingenious, outspoken, challenging, apt, one, analyzing, another, resourceful, turn, solving, bored, generating, alert, adept, reading, seldom, strategically, thing, stimulating |
ISFJ | feel, accurate, specifics, concerned, thorough, obligations, painstaking, meeting, considerate, strive, remember, steady, harmonious, create |
ISFP | opinions, whats, frame, within, force, space, going, dislike, disagreements, around, conflicts, present, kind, sensitive |
ESFJ | tasks, contribute, cooperative, need, establish, even, appreciated, harmony, complete, day-by-day, accurately, matters, determination, warmhearted, try, small, lives |
ESTP | pragmatic, bore, theories, style, here-and-now, immediate, focused, active, energetically |
ENFP | imaginative, often, verbal, give, connections, fluency, information, rely, based, proceed, lot, improvise, confidently, full, warmly, appreciation, affirmation, enthusiastic, support |
INTP | abstract, unusual, area, contained, theoretical, interests, interaction, always, sometimes, depth, critical, social, analytical |
'Descriptive' Unique Words
ENFJ | highly, individual, responsive, empathetic, warm, sociable |
ESFP | exuberant |
INFJ | insightful, firm |
ISTJ | logically, thoroughness, decide, serious, steadily |
ENTJ | organizational, illogical, comprehensive, frank, usually, assume, well, long-term |
ESTJ | also, possible, systematically |
INTJ | high, explanatory, independent, great, long-range, original |
ISTP | workable, large |
ENTP | ingenious, resourceful, seldom, strategically |
ISFJ | accurate, concerned, strive, steady, harmonious |
ISFP | dislike, present, sensitive |
ESFJ | cooperative, even, appreciated, complete, day-by-day, accurately, small |
ESTP | pragmatic, here-and-now, active, energetically |
INTP | abstract, unusual, theoretical, always, sometimes, depth, critical, social, analytical |
ENFP | imaginative, often, verbal, give, rely, confidently, full, enthusiastic |
INFP | curious, idealistic |
Meh, tl;dr... just show me the code!
The final code (without being seperated by all my rambling in between, however, still with the delicious comments) looks like this:
# imports
import nltk
import pprint
from nltk.corpus import stopwords
# pretty print settings
pp = pprint.PrettyPrinter(indent=4)
# declare dicts otherwise Python will rage at us
dMyersBriggs = {}
dCleanedMyersBriggs = {}
dUniqueMyersBriggs = {}
dDescriptiveMyersBriggs = {}
# the dict values - text directly from http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/the-16-mbti-types.htm?bhcp=1
dMyersBriggs['ISTJ'] = 'Quiet, serious, earn success by thoroughness and dependability. Practical, matter-of-fact, realistic, and responsible. Decide logically what should be done and work toward it steadily, regardless of distractions. Take pleasure in making everything orderly and organized - their work, their home, their life. Value traditions and loyalty.'
dMyersBriggs['ISFJ'] = 'Quiet, friendly, responsible, and conscientious. Committed and steady in meeting their obligations. Thorough, painstaking, and accurate. Loyal, considerate, notice and remember specifics about people who are important to them, concerned with how others feel. Strive to create an orderly and harmonious environment at work and at home.'
dMyersBriggs['INFJ'] = 'Seek meaning and connection in ideas, relationships, and material possessions. Want to understand what motivates people and are insightful about others. Conscientious and committed to their firm values. Develop a clear vision about how best to serve the common good. Organized and decisive in implementing their vision.'
dMyersBriggs['INTJ'] = 'Have original minds and great drive for implementing their ideas and achieving their goals. Quickly see patterns in external events and develop long-range explanatory perspectives. When committed, organize a job and carry it through. Skeptical and independent, have high standards of competence and performance - for themselves and others.'
dMyersBriggs['ISTP'] = 'Tolerant and flexible, quiet observers until a problem appears, then act quickly to find workable solutions. Analyze what makes things work and readily get through large amounts of data to isolate the core of practical problems. Interested in cause and effect, organize facts using logical principles, value efficiency.'
dMyersBriggs['ISFP'] = 'Quiet, friendly, sensitive, and kind. Enjoy the present moment, whats going on around them. Like to have their own space and to work within their own time frame. Loyal and committed to their values and to people who are important to them. Dislike disagreements and conflicts, do not force their opinions or values on others.'
dMyersBriggs['INFP'] = 'Idealistic, loyal to their values and to people who are important to them. Want an external life that is congruent with their values. Curious, quick to see possibilities, can be catalysts for implementing ideas. Seek to understand people and to help them fulfill their potential. Adaptable, flexible, and accepting unless a value is threatened.'
dMyersBriggs['INTP'] = 'Seek to develop logical explanations for everything that interests them. Theoretical and abstract, interested more in ideas than in social interaction. Quiet, contained, flexible, and adaptable. Have unusual ability to focus in depth to solve problems in their area of interest. Skeptical, sometimes critical, always analytical.'
dMyersBriggs['ESTP'] = 'Flexible and tolerant, they take a pragmatic approach focused on immediate results. Theories and conceptual explanations bore them - they want to act energetically to solve the problem. Focus on the here-and-now, spontaneous, enjoy each moment that they can be active with others. Enjoy material comforts and style. Learn best through doing.'
dMyersBriggs['ESFP'] = 'Outgoing, friendly, and accepting. Exuberant lovers of life, people, and material comforts. Enjoy working with others to make things happen. Bring common sense and a realistic approach to their work, and make work fun. Flexible and spontaneous, adapt readily to new people and environments. Learn best by trying a new skill with other people.'
dMyersBriggs['ENFP'] = 'Warmly enthusiastic and imaginative. See life as full of possibilities. Make connections between events and information very quickly, and confidently proceed based on the patterns they see. Want a lot of affirmation from others, and readily give appreciation and support. Spontaneous and flexible, often rely on their ability to improvise and their verbal fluency.'
dMyersBriggs['ENTP'] = 'Quick, ingenious, stimulating, alert, and outspoken. Resourceful in solving new and challenging problems. Adept at generating conceptual possibilities and then analyzing them strategically. Good at reading other people. Bored by routine, will seldom do the same thing the same way, apt to turn to one new interest after another.'
dMyersBriggs['ESTJ'] = 'Practical, realistic, matter-of-fact. Decisive, quickly move to implement decisions. Organize projects and people to get things done, focus on getting results in the most efficient way possible. Take care of routine details. Have a clear set of logical standards, systematically follow them and want others to also. Forceful in implementing their plans.'
dMyersBriggs['ESFJ'] = 'Warmhearted, conscientious, and cooperative. Want harmony in their environment, work with determination to establish it. Like to work with others to complete tasks accurately and on time. Loyal, follow through even in small matters. Notice what others need in their day-by-day lives and try to provide it. Want to be appreciated for who they are and for what they contribute.'
dMyersBriggs['ENFJ'] = 'Warm, empathetic, responsive, and responsible. Highly attuned to the emotions, needs, and motivations of others. Find potential in everyone, want to help others fulfill their potential. May act as catalysts for individual and group growth. Loyal, responsive to praise and criticism. Sociable, facilitate others in a group, and provide inspiring leadership.'
dMyersBriggs['ENTJ'] = 'Frank, decisive, assume leadership readily. Quickly see illogical and inefficient procedures and policies, develop and implement comprehensive systems to solve organizational problems. Enjoy long-term planning and goal setting. Usually well informed, well read, enjoy expanding their knowledge and passing it on to others. Forceful in presenting their ideas.'
for sPersonality, lDescription in dMyersBriggs.iteritems():
print sPersonality + " " + str(len(lDescription.split())) # word counts raw description
for sPersonality, sDescription in dMyersBriggs.iteritems():
# convert to lower
sDescription = sDescription.lower()
# remove '.', ' - ', and '.'
sDescription = sDescription.replace(".", " ")
sDescription = sDescription.replace(" - ", " ")
sDescription = sDescription.replace(",", " ")
# convert string to list of words
lDescription = sDescription.split()
# remove stop words
lDescription = [sWord for sWord in lDescription if sWord not in stopwords.words('english')]
# finally throw the tastiness into the 'cleaned' dict - converted to set for unique only, and back to a list for the list type which will prove useful later
dCleanedMyersBriggs[sPersonality] = list(set(lDescription)) # unique only
print "Words unique to personality:"
for sPersonality, lDescription in dCleanedMyersBriggs.iteritems():
# loop at each word in this personality's description:
lWords = [] # build an array of unique words for each personality
for sWord in lDescription:
bFoundInOtherDescription = False # set match count of this word to 0
# loop over all other personality descriptions
for sInnerPersonality, lInnerDescription in dCleanedMyersBriggs.iteritems():
if sPersonality == sInnerPersonality: #
continue # don't search itself
if sWord in lInnerDescription:
bFoundInOtherDescription = True
if bFoundInOtherDescription == False: # it is only unique if there were no other matches in any other of the descriptions
lWords.append(sWord)
print sPersonality + " " + str(len(lWords)) # word counts unique for description
dUniqueMyersBriggs[sPersonality] = lWords
pp.pprint(dUniqueMyersBriggs) # pretty print the dict
print "'Descriptive' words for each personality:"
for sPersonality, lDescription in dUniqueMyersBriggs.iteritems():
lWords = []
lTokenText = nltk.word_tokenize(" ".join(lDescription)) # tokenize text
lTaggedText = nltk.pos_tag(lTokenText) # tag tokens
for tTaggedText in lTaggedText:
if tTaggedText[1] == "RB" or tTaggedText[1] == "RBR" or tTaggedText[1] == "JJR" or tTaggedText[1] == "JJ" or tTaggedText[1] == "JJR" or tTaggedText[1] == "JJS":
lWords.append(tTaggedText[0])
dDescriptiveMyersBriggs[sPersonality] = lWords
print sPersonality + " " + str(len(lWords)) # word counts for 'descriptive' words for description
pp.pprint(dDescriptiveMyersBriggs)
Hope you all enjoyed! Cheers everyone! 🍺
-Chris