Creating Markov Chains for various characters of South Park with React and RiTaJS

Lots of fun with this project :)

Posted byon November 01, 2017

Indeed, Southpark is Hilarious…

tl;dr; - the app is here: https://south-park-markov-chains.nlp-champs.com

So, what better way to learn some new NLP techniques than to combine the sometimes drab aspects of NLP with the hilarity of our friends from South Park?

I had read some neat articles about Markov Chains, including this very cool one about simulating wine recommendations, and this site about creating a Markov Chain for Trump tweets. That one is very funny - you can generate ‘new’ tweets from Trump using a Markov Chain that is built based on all his old tweets. I had learned about Markov Chains original in a Twitter bot course on egghead.io (that is NOT an affiliate link (seriously, its not) - and as of November 6th, 2017 when this post was published, it was a course for pro users anyway - sorry to anyone who doesn’t have pro!)

Anyways, while thinking about Twitter bots, my thought process was like this: dang, Twitter bots are so annoying. What is possibly the most annoying thing in the world? THAT would make a great Twitter bot! Hmmmmm… ha! I know! CARTMAN! Then I stopped for a moment, thinking that I didn’t want to bring such an annoying Twitter bot into an already sometimes very annoying internet, but instead COULD make a small web app for it!

So, indeed, much to the benefit of the rest of the world, I did not bring yet another annoying bot to Twitter, but instead made a nice subdomain here on NLP Champs for all of you to enjoy! Check it out!

A quick display of the user process on the south park Markov Chains website.

So, if you’ve already tooled around on the app, without further ado, let’s get to the technicals!

Note: if you’re interested in the backend development, jump down to the backend section here. Likewise, If you’re interested in the frontend development, jump down to frontend section here. For code samples and the full code itself, jump down to the code samples section here.

Backend

I wrote the backend stuff first, so that’s what we’ll dive into… first 😂

BACKend BACKground

I knew for the backend I wanted to be able to handle three simple parameters: 1. the character to generate the sentence for, 2. the season of South Park to pull the text for that character, 3. and the Markov Chain length (more on what that is exactly later) As it turns out, generating files of the character’s lines per season was what took the most time - let’s look into it.

Defining The Characters

I chose to focus on the four main characters: Cartman, Kyle, Stan, and Kenny, with some of my favorites added in (Chef, Mr. Mackey, and PC Principal) I’m satisfied with how the app is now, but if you want more characters, feel free to leave a comment down below. Or, if you’re super motivated, between the repo and this post, you should be able to modify the backends and front ends enough to do it yourself! Clone the repo and get hacking!

The Seasons

Seasons 1 - 19

Getting each character’s lines from seasons 1 through 19 was a cinch thanks to a great south park data github repository by BobAdamsEE. I cloned the repo and put the by-season data into the folder /data/original/ in my own project. CSV works nicely with Python’s Pandas DataFrames, so for each season, I import the CSV into a DataFrame:

import pandas as pd
for sSeasonNumber in lSeasonNumbers:
  df = pd.read_csv('./data/original/Season-' + sSeasonNumber + '.csv'); # read in data

To get the lines for each character, we can simple filter by column ‘Character’ - for each character we are interested in, excluding any empty lines:

for sCharacter in lCharacters:
  df = pd.read_csv('./data/original/Season-' + sSeasonNumber + '.csv'); # read in data
  df = df[df['Character'] == sCharacter] # overwrite itself with the filtered dataframe
  if len(df) == 0:
      continue # don't write a csv if there is no lines for the character in this season!

And then we export the new DataFrame as a .txt file to the neighbouring folder /data/processed/, in the format of s[season]\_[character]:

df.to_csv('./data/processed/s' + sSeasonNumber + '_' + sCharacter.replace(" ","").lower() + '.txt' , header=None, index=None, sep=' ', mode='a') # write that data to a raw text for our markov chains

So, all together (and much thanks to the neatly organized CSV file), the total data preparation Python script looks like this:

processcsv.py

import pandas as pd

lSeasonNumbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19']
lCharacters = ['Cartman', 'Kyle', 'Stan', 'Kenny', 'Butters', 'PC Principal', 'Chef', 'Tweak', 'Craig', 'Mr. Mackey']

for sSeasonNumber in lSeasonNumbers:
    for sCharacter in lCharacters:
        df = pd.read_csv('./data/original/Season-' + sSeasonNumber + '.csv'); # read in data
        df = df[df['Character'] == sCharacter] # overwrite itself with the filtered dataframe
        if len(df) == 0:
            continue # don't write a csv if there is no lines for the character in this season!
        df.to_csv('./data/processed/s' + sSeasonNumber + '_' + sCharacter.replace(" ","").lower() + '.txt' , header=None, index=None, sep=' ', mode='a') # write that data to a raw text for our markov chains

Notice we do have have to load the CSV into DataFrame anew for each character a new because the DataFrame gets overwritten during the character filter process.

Season 20

Season 20 was a lot less fancy. It was actually the first season I got data from, since I was very interested in generating sentences for the memberberries 😂. For Season 20, I copied the script from the awesome south park wikia (yes, by hand - I decided it would have taken me longer to write a web scraper than just quickly grab the scripts from all 8 episodes). Copying and pasting that into a text file, I had the following format:

[character]
[the character's lines]

format, so I could regex all lines for a single word was a characters name, and then grab the next line:

lCharacterLines = re.findall(r"(?<=" + sCharacter + "\\n)(.*)(?=\\n)", sText) # this gets every line following a line that is totally empty except for character line
sCharacterLines = " ".join(lCharacterLines) # join list back to full string
sCharacterLines = removeBrackets(sCharacterLines) # remove everything in [] (including the '[]' themselves)

Unfortunately, this messy method wasn’t so easy - I also had to do a small cleaning. I noticed the format for any stage queues or scene information on the site was always wrapped in brackets: []. I of course needed to remove those chunks because I didn’t want any of that stuff to get into the characters lines and thus our Markov Chain. For some reason regex’ing to remove everything inside [] wasn’t working for me, so I ended up using a small function I found on stackoverflow:

def removeBrackets(test_str):
  ret = ''
  skip1c = 0
  skip2c = 0
  for i in test_str:
      if i == '[':
          skip1c += 1
      elif i == ']' and skip1c > 0:
          skip1c -= 1
      elif skip1c == 0 and skip2c == 0:
          ret += i
  return ret

Creating the Markov Chain

With the characters and seasons now built in to the backend, it was time to create the actual Markov Chain to generate the character’s sentences.

To create a Markov Chain, I used RiTaJS’s RiMarkov class, because you need to provide only a chain length and a path to a text file containing your corpora (in our case, the text file of all a character’s lines for a given season). Once you’ve done that, RiTaJS works it’s ✨ magic✨ in the background and creates a Markov Chain generator. (Don’t worry - we will have a deep dive into Markov Chains as a future post here on NLP Champs). With the RiMarkov object, we can generate sentences with the method generateSentance():

var fs = require('fs');
var rita = require('rita');

var markov = new rita.RiMarkov(3);
var inputText = fs.readFileSync('./data/processed/s20_cartman.txt', 'utf8'); // corpora
markov.loadText(inputText); // load corpora
var sentences = markov.generateSentences(1);
console.log(sentences[0]);

Running this in the terminal with node gives:

chris@chris-mac [~/] node example.js
I wanna get the taste of ass out of my mouth.

Yep, that sounds like Cartman alright.

Okay, we’ve got our cleanly labeled corpora files for each character-season combo and our RiMarkov object generating sentences for us, so all that’s left to do is create a nice frontend UI and hook it up to the backend!

Frontend

Scaffolding and Frameworks

The frontend started with the yeoman react-webpack generator, which is a scaffold for react with webpack - you can write a bunch of components and then with npm run dist they compile down to just 3 files: index.html, app.js, and app.map.js. I’ve really fallen in love with this scaffold for when I want to quickly build a small webapp like this one. For styling, I use the CSS from bootstrap since I’m familiar with that from my ol’ web 1.0 days.

User Selection in the UI

So, scaffolding done. I knew for the UI I wanted to provide the user with the same select options that we built the backend to handle: 1. character, 2. season, and 3. Markov Chain length. This of course is a perfect use for state, and appears as the following in the class constructor:

this.state = {
  season: "20",
  seasonDisplay: "Season 20",
  character: "berries",
  characterDisplay: "Memberberries",
  characterImage: berries,
  markov: '2',
  sentence: "...",
  year: new Date().getFullYear()
}

Note that for each we have the actual value (this is the program friendly value that we will send to the backend) and the ‘display’ value (this is the value that the user sees.)

To actually show the options in the app, I decided a dropdown was probably best and simplest way, and I used the tasty react-select library to accomplish binding the state and the options to the UI. I also thought it would be cool if I had a picture of the character that dynamically changed every time the user selected a new character. Again - easily accomplished with state:

// images
let berries = require('../images/berries.jpg');
let cartman = require('../images/cartman.jpg');
let stan = require('../images/stan.jpg');
let kyle = require('../images/kyle.jpg');
let kenny = require('../images/kenny.jpg');
let pcprincipal = require('../images/pcprincipal.png');

...

// used in onChangeCharacter when state is changed - state name points to required image
var oImageRefs = {
  'berries': berries,
  'cartman': cartman,
  'stan': stan,
  'kyle': kyle,
  'kenny': kenny,
  'pcprincipal': pcprincipal
}

...

onChangeCharacter(oSelected) {
  console.log(oSelected);
  var oImage = oImageRefs[oSelected.value];
  this.setState({character: oSelected.value, characterDisplay: oSelected.label, characterImage: oImage});
}

And that was about it for the UI aspects.

$.ajax POST to the Backend

With UI aspects done, I needed to simply query our Markov Chain in the backend to generate a sentence! To accomplish this, I just hooked up the state to a JSON object and made a call to the backend with a simple $.ajax call:

// post to server
var that = this;
var data = JSON.stringify({
  markov: this.state.markov,
  season: this.state.season,
  character: this.state.character
});
if (typeof window !== 'undefined') {
 $.ajax({
    url: '/generate_sentence',
    type: 'POST',
    dataType: 'json',
    data: data,
    contentType: 'application/json',
    cache: false,
    timeout: 5000,
    success: function(oData) {
      if (oData.bNotFound) {
        console.log('showing message...');
        notify.show('It seems that character doesn\'t show up in that season! Try a different combination!', 'custom', 5000, notifyColor); // prompt the user for a proper email
      } else {
        that.setState({sentence: oData.sentence});
      }
    },
    error: function() {
    }
  });
}

You can see for the character-season combinations that had no lines - AKA no file on the backend, I created a small warning message using the react-notify-toast library - a big ol’ notification on the top of the site that warns the user that combination doesn’t exist:

Preview of what the error notification shows when the season-character combination does not have any character lines associated with it.

This looks like just a simple try-catch on the backend with fs:

var iMarkov = parseInt(req.body.markov); // markov chain length
var sSeason = req.body.season; // desired season
var sCharacter = req.body.character; // desired character
var sFileName = './data/processed/s' + sSeason + '_' + sCharacter + '.txt'
var inputText;
try {
  inputText = fs.readFileSync(sFileName, 'utf8');
} catch (err) {
  var response = {
      status: 200,
      sentence: '',
      bNotFound: true
  };
  res.send(JSON.stringify(response));
  return;
}

Yes - I know what some of you are thinking, I could indeed build a check in the character state when a season is selected, that would then filter the valid seasons for that character, or vice-versa - but that would have been a bit more work which I was not intersted in doing for this small project :)

Conclusions and Notes

Well, that’s about it - this was my first full stack app blog post! What did you think? Could I change anything? Do anything better? Let me know in the comments below! Cheers! 🍺

Code and Code Samples

Here are some of the main code samples from the project - both the backend (index.js) and frontend (main.js) codes. While these are linked directly from the Github repository via it’s API, if you really want to learn how the app works, your best bet would be going directly to the repo for this project.

Main.js

require('normalize.css/normalize.css');
require('styles/App.css');

// react and other 3rd party
import React from 'react';
import Select from 'react-select';
import Notifications, {notify} from 'react-notify-toast';

// Be sure to include styles at some point, probably during your bootstrapping
import 'react-select/dist/react-select.css';

// constants and colors
const red   = '#FF0000';
const white = '#FFFFFF';
let notifyColor = { background: red, text: white };

// images
let berries = require('../images/berries.jpg');
let cartman = require('../images/cartman.jpg');
let stan = require('../images/stan.jpg');
let kyle = require('../images/kyle.jpg');
let kenny = require('../images/kenny.jpg');
let pcprincipal = require('../images/pcprincipal.png');
let mrmackey = require('../images/mackey.jpg');

var $ = require("jquery");

var oImageRefs = {
  'berries': berries,
  'cartman': cartman,
  'stan': stan,
  'kyle': kyle,
  'kenny': kenny,
  'pcprincipal': pcprincipal,
  'mr.mackey': mrmackey
}
var oMarkovOptions = [
  { value: '2', label: '2 - not very legible but creative' },
  { value: '3', label: '3 - pretty legible but not as creative' },
  { value: '4', label: '4 - very legible and not very creative' },
  { value: '5', label: '5 - nearly exact copy of character\'s lines' }
];
var oCharacterOptions = [
  { value: 'berries', label: 'Memberberries' },
  { value: 'cartman', label: 'Cartman' },
  { value: 'stan', label: 'Stan' },
  { value: 'kyle', label: 'Kyle' },
  { value: 'kenny', label: 'Kenny' },
  { value: 'pcprincipal', label: 'PC Principal' },
  { value: 'mr.mackey', label: 'Mr. Mackey' }
];

// magic ES6... because ben is mean :)
var aSeasonOptions = [];
[...Array(20)].map((_, i) => { aSeasonOptions.push({value: (i + 1).toString(), label: 'Season ' + (i + 1).toString()})});
aSeasonOptions = aSeasonOptions.reverse();

class AppComponent extends React.Component {
  constructor() {
    super();
    this.state = {
      season: "20",
      seasonDisplay: "Season 20",
      character: "berries",
      characterDisplay: "Memberberries",
      characterImage: berries,
      markov: '2',
      sentence: "...",
      year: new Date().getFullYear()
    }
    this.onChangeCharacter = this.onChangeCharacter.bind(this);
    this.onChangeSeason = this.onChangeSeason.bind(this);
    this.onChangeMarkov = this.onChangeMarkov.bind(this);
    this.onClickGenerateSentence = this.onClickGenerateSentence.bind(this);
  }
  onChangeSeason(oSelected) {
    this.setState({season: oSelected.value, seasonDisplay: oSelected.label});
  }
  onChangeCharacter(oSelected) {
    console.log(oSelected);
    var oImage = oImageRefs[oSelected.value];
    this.setState({character: oSelected.value, characterDisplay: oSelected.label, characterImage: oImage});
  }
  onChangeMarkov(oSelected) {
    this.setState({markov: oSelected.value});
  }
  onClickGenerateSentence() {
    // post to server
    var that = this;
    var data = JSON.stringify({
      markov: this.state.markov,
      season: this.state.season,
      character: this.state.character
    });
    if (typeof window !== 'undefined') {
     $.ajax({
        url: '/generate_sentence',
        type: 'POST',
        dataType: 'json',
        data: data,
        contentType: 'application/json',
        cache: false,
        timeout: 5000,
        success: function(oData) {
          if (oData.bNotFound) {
            console.log('showing message...');
            notify.show('It seems that character doesn\'t show up in that season! Try a different combination!', 'custom', 5000, notifyColor); // prompt the user for a proper email
          } else {
            that.setState({sentence: oData.sentence});
          }
        },
        error: function() {
        }
      });
    }
  }
  render() {
    return (
      <div>
        <Notifications options={{zIndex: 9999999}}/>
        {/* Navigation */}
        <nav className="navbar navbar-expand-lg navbar-light fixed-top" id="mainNav">
          <div className="container">
            <span className="navbar-brand js-scroll-trigger">Southpark Markov Chains!</span>
            <button className="navbar-toggler navbar-toggler-right" type="button" data-toggle="collapse" data-target="#navbarResponsive" aria-controls="navbarResponsive" aria-expanded="false" aria-label="Toggle navigation">
              Menu
              <i className="fa fa-bars" />
            </button>
            <div className="collapse navbar-collapse" id="navbarResponsive">
              <ul className="navbar-nav ml-auto">
                <li className="nav-item">
                  <a className="nav-link js-scroll-trigger" href="#createASentence">Create a sentence!</a>
                </li>
                <li className="nav-item">
                  <a className="nav-link js-scroll-trigger" href="#whoDunIt">Who dun' it?!?!</a>
                </li>
              </ul>
            </div>
          </div>
        </nav>
        {/* Header */}
        <header className="masthead">
          <div className="container">
            <div className="intro-text">
              <span className="name">Southpark Markov Chains!</span>
              <hr className="star-light" />
              <span className="skills">Quick bites (bytes?) of hilarity from neat NLP processes.<br/>Scroll to check it out.</span>
            </div>
          </div>
        </header>
        {/* About Section */}
        <section className="contact" id="createASentence">
          <div className="container">
            <img className="mx-auto d-block" width="175rem" src={this.state.characterImage} alt/>
            <h5 className="text-center">You've currently selected:</h5>
            <h2 className="text-center">{this.state.characterDisplay}<br/>({this.state.seasonDisplay})</h2>
              <div className="row">
                <div className="col-lg-6 mx-auto">
                <p>Season</p>
                </div>
              </div>
            <div className="row">
                <div className="col-lg-6 mx-auto">
                  <Select
                    value={this.state.season}
                    options={aSeasonOptions}
                    onChange={this.onChangeSeason}
                  />
                </div>
              </div>
              <div className="row">
                <div className="col-lg-6 mx-auto">
                <p>Character</p>
                </div>
              </div>
            <div className="row">
              <div className="col-lg-6 mx-auto">
                <Select
                  value={this.state.character}
                  options={oCharacterOptions}
                  onChange={this.onChangeCharacter}
                />
              </div>
            </div>
            <div className="row">
              <div className="col-lg-6 mx-auto">
              <p>Markov chain length</p>
              </div>
            </div>
            <div className="row">
                <div className="col-lg-6 mx-auto">
                  <Select
                    value={this.state.markov}
                    options={oMarkovOptions}
                    onChange={this.onChangeMarkov}
                  />
                </div>
              </div>
            <div className="row">
              <div className="col-lg-12 text-center">
                <div className="dropdown">
                  <button className="btn btn-primary btn-lg mx-auto" type="button" onClick={this.onClickGenerateSentence}>
                    GENERATE!
                  </button>
                </div>
              </div>
            </div>
            <div className="row">
              <div className="col-lg-8 mx-auto">
                <p className="text-center">Go ahead - select a season, a character, then click GENERATE!</p>
              </div>
            </div>
            <div className="row">
              <div className="col-lg-8 mx-auto">
                <h1 className="text-center"><span style={{color:'blue'}}>"</span>{this.state.sentence}<span style={{color:'blue'}}>"</span></h1>
              </div>
            </div>
          </div>
        </section>
        {/* Contact Section */}
        <section id="whoDunIt">
          <div className="container">
            <h2 className="text-center">Who dun' it?!?!</h2>
            <hr className="star-primary" />
            <div className="row">
              <div className="col-lg-8 mx-auto">
                <p>Why, NLP Champs my dear person! Who are the NLP Champ? NLP Champs are a bunch of bloggers who insist on making natural language processing (NLP) avaliable to all, from the champiest of champs, to the noobiest of noobs. You can start anywhere, just dive in! Don't know what NLP is? Don't want to be a champ? Don't care? It doesn't matter! Check out our NLP-beginner-and-expert friendly site <a href="https://nlp-champs.com">here</a>. (We made a <a href="http://npl-champs.com/creating-markov-chains-for-various-characters-of-south-park-with-react-and-ritajs">blog post</a> about how we built this very site!)</p>
              </div>
            </div>
          </div>
        </section>
        {/* Footer */}
        <footer className="text-center">
          <div className="footer-above">
            <div className="container">
              <div className="row">
                <div className="footer-col col-md-8 mx-auto">
                  <h3>About NLP Champs</h3>
                  <p>NLP Champs is a community focused on developing and sharing natural language processing techniques. Check us out!<br/><a href="https://nlp-champs.com">nlp-champs.com</a></p>
                </div>
                <div className="footer-col col-md-8 mx-auto">
                  <h3>Credits and Thanks</h3>
                  <ul>
                    <li>EXCELLENT curated .csv files for seasons 1 - 19 from <a href="https://github.com/BobAdamsEE/SouthParkData">BobAdamsEE's github repository</a>.</li>
                    <li>Seasons 20 and 21 pulled manually by hand from the <a href="http://southpark.wikia.com/wiki/Portal:Scripts/Season_Twenty">tasty South Park Wikia.</a></li>
                    <li>Neat Cartman favicon from <a href="http://www.favicon.cc/?action=icon&file_id=255991">lordeblader at favicon.cc</a></li>
                    <li><a href="https://github.com/dhowe/RiTaJS">RiTaJS Library</a></li>
                    <li><a href="https://github.com/react-webpack-generators/generator-react-webpack">Yeoman React-Webpack Generator</a></li>
                  </ul>
                </div>
              </div>
            </div>
          </div>
          <div className="footer-below">
            <div className="container">
              <div className="row">
                <div className="col-lg-12">
                  Copyright © NLP-Champs {this.state.year}
                </div>
              </div>
            </div>
          </div>
        </footer>
      </div>
    );
  }
}

AppComponent.defaultProps = {
};

export default AppComponent;

index.js

var http = require('http');
const express = require('express');
const morgan = require('morgan');
const path = require('path');
const app = express();
var fs = require('fs');
var bodyParser = require('body-parser');
var fs = require('fs');
var rita = require('rita');

// bodyParser to get posts from $.ajax
app.use(bodyParser.json());

// Serve static assets
app.use(express.static('./dist'));

// Setup logger
app.use(morgan(':remote-addr - :remote-user [:date[clf]] ":method :url HTTP/:http-version" :status :res[content-length] :response-time ms'));

// on the POST of the /generate_sentance, generate a sentance!!!!
app.post('/generate_sentence', function (req, res) {
  var iMarkov = parseInt(req.body.markov); // markov chain length
  var sSeason = req.body.season; // desired season
  var sCharacter = req.body.character; // desired character
  var sFileName = './data/processed/s' + sSeason + '_' + sCharacter + '.txt'
  var inputText;
  try {
    inputText = fs.readFileSync(sFileName, 'utf8');
  } catch (err) {
    var response = {
        status: 200,
        sentence: '',
        bNotFound: true
    };
    res.send(JSON.stringify(response));
    return;
  }
  var markov = new rita.RiMarkov(iMarkov);
  markov.loadText(inputText);
  var sentences = markov.generateSentences(1);
  var response = {
      status: 200,
      sentence: sentences[0],
      bNotFound: false
  };
  res.send(JSON.stringify(response));
});

server = http.createServer(app);

// listening ports
server.listen(process.env.PORT || 9000);

Leave a Comment

Thoughts? Critiques? Just saved a bunch of money by switching to GEICO? Leave a Comment!