Dead Kitty: February 2023

Monday, February 27, 2023

New Features on My Ngram Generator

I've made some cool changes to the NGRAM generator.

You can now "generate by word" which will select a random word and produce a window with all of the NGRAM links. You can select one of the links, and then the next window will load all of the links for the last-selected word. This is a more intentional way of generating. My approach has been primarily to use blocks of text to create from. So this offers a new way to create from NGRAMs.

Also, the same way that NGRAMS can be selected like this, they can also be prepared more intentionally.

The new algorithm will save whitespace, but it was producing too much whitespace. For that reason, I added a line of code that prevents an association between two identical NGRAMS. Therefore, you will never find ..., 27 newlines in a row, or other crap I didn't want. While potentially saving some of the formattings.

If you use the generator, I recommend downloading a copy, placing it in a directory, and then saving some data for use later. This is a pretty solid generator, but there's still a lot more we can do.

Friday, February 24, 2023

Third-party ngram generator

This generator produces very nice, clean text in sentences creating large blocks of text. I like this better than algorithms that include linebreaks. I thought including linebreaks would be better, but it isn't. At any rate, this algorithm produces really nice output. You can install the library here.

Sample text from my own work:

collapsing into. for the prophet s desire for by which like nations. the light at Adam Adam at a night then comdemn. and the battle for hell. our way to characterize itself. such burials so the plucked fish go to the function purpose cause an epiphany. the yawns of self satisfied. a dark brown sludge cries of good soil they will win out of genesis of destinies.

var generator = require('ngram-natural-language-generator').generator;
var fs = require('fs');

generator({
    filename: 'corpus.txt',
    model: {
        maxLength: 1000,
        minLength: 900
    }
}, function(err, sentence){
    if (err) {
        console.error(err);
    } else {
        fs.writeFile('ngramsoutput.txt', sentence, function(err) {
            if (err) {
                console.error(err);
            } else {
                console.log('Sentence saved to ngramsoutput.txt!');
            }
        });
    }
});

Wednesday, February 22, 2023

Bayes Sandbox (Updated)

wink-expression-extractor.js

Takes "input.txt" produces "output.json"

const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const fs = require('fs');

const nlp = winkNLP(model);

const text = fs.readFileSync('input.txt', 'utf-8');

// Obtain "its" helper to extract item properties.
const its = nlp.its;
// Obtain "as" reducer helper to reduce a collection.
const as = nlp.as;

const patterns = [
  {
    name: 'nounPhrase',
    label: 'nounPhrase',
    patterns: [ '[|DET] [|ADJ] [NOUN|PROPN]' ]
  },
  {
    name: 'verbPhrase',
    label: 'verbPhrase',
    patterns: [ '[|ADV] [|PARTICLE] [|ADJ] [|NOUN] [VERB]' ]
  }
];

nlp.learnCustomEntities(patterns);

const doc = nlp.readDoc(text);

const entities = doc.customEntities().out(its.detail);

fs.writeFile('output.json', JSON.stringify(entities), (err) => {
  if (err) throw err;
  console.log('The output has been written to output.json');
});

Input.txt can be any textfile. This will extract some noun and verb phrases that match specific patterns and then output them to a json file.

Bayes_sandbox.js

Takes "output.json"

Expected output: "verbPhrase"

const fs = require('fs');

// Read the contents of the file into a string
const jsonString = fs.readFileSync('output.json', 'utf-8');

// Parse the JSON string into a JavaScript object
const jsonObj = JSON.parse(jsonString);

// Load Naive Bayes Text Classifier
var Classifier = require( 'wink-naive-bayes-text-classifier' );
// Instantiate
var nbc = Classifier();
// Load wink nlp and its model
const winkNLP = require( 'wink-nlp' );
// Load language model
const model = require( 'wink-eng-lite-web-model' );
const nlp = winkNLP( model );
const its = nlp.its;

const prepTask = function ( text ) {
  const tokens = [];
  nlp.readDoc(text)
      .tokens()
      // Use only words ignoring punctuations etc and from them remove stop words
      .filter( (t) => ( t.out(its.type) === 'word' && !t.out(its.stopWordFlag) ) )
      // Handle negation and extract stem of the word
      .each( (t) => tokens.push( (t.out(its.negationFlag)) ? '!' + t.out(its.stem) : t.out(its.stem) ) );

  return tokens;
};
nbc.definePrepTasks( [ prepTask ] );
// Configure behavior
nbc.defineConfig( { considerOnlyPresence: true, smoothingFactor: 0.5 } );
// Train!

jsonObj.forEach(obj =>
    nbc.learn(obj.value, obj.type));

nbc.consolidate();

console.log( nbc.predict( 'failing stars' ) );

This file uses the data extracted from the first file to train a Bayes classifier. It then should correctly predict that "failing stars" is a verbal phrase.

Bayes Sandbox

WinkNLP has a simple but powerful text classifier using Bayesian analysis. I have working code.

const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const fs = require('fs');

const nlp = winkNLP(model);

const text = fs.readFileSync('input.txt', 'utf-8');

const patterns = [
  {
    name: 'nounPhrase',
    patterns: [ '[|DET] [|ADJ] [NOUN|PROPN]' ]
  },
];

nlp.learnCustomEntities(patterns);

const doc = nlp.readDoc(text);
const entities = doc.customEntities().out();

fs.writeFile('noun_phrases.txt', entities.join('\n'), (err) => {
  if (err) throw err;
  console.log('The output has been written to output.txt');
});

Okay, this is clunky, but functioning. You will need to run this code through twice, once to create a textfile called noun_phrases.txt and another to create a textfile called verb_phrases.txt.

Now, we want to do the same thing, but create a file called verb_phrases.txt. To make this as simple as possible, I've provided a separate file. Clunky, I know.

const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const fs = require('fs');

const nlp = winkNLP(model);

const text = fs.readFileSync('input.txt', 'utf-8');

const patterns = [
  {
    name: 'verbPhrase',
    patterns: [ '[|ADV] [|PARTICLE] [|ADJ] [|NOUN] [VERB]' ]
  }
];

nlp.learnCustomEntities(patterns);

const doc = nlp.readDoc(text);
const entities = doc.customEntities().out();

fs.writeFile('verb_phrases.txt', entities.join('\n'), (err) => {
  if (err) throw err;
  console.log('The output has been written to output.txt');
});

Obviously, you can just make the edits yourself. I have to preprocess the data to get rid of the clunkiness. This is an epic amount of clunkiness.

// Load Naive Bayes Text Classifier
var Classifier = require('wink-naive-bayes-text-classifier');
// Instantiate
var nbc = Classifier();
// Load wink nlp and its model
const winkNLP = require('wink-nlp');
// Load language model
const model = require('wink-eng-lite-web-model');
const nlp = winkNLP(model);
const its = nlp.its;
const fs = require('fs');

// Function to read and preprocess the contents of the file
const readAndPreprocessFile = function (filePath) {
  const fileContents = fs.readFileSync(filePath, 'utf-8');
  const nounPhrases = fileContents.split('\n');
  return nounPhrases;
};

// Define a pre-processing task
const prepTask = function (text) {
  const tokens = [];
  nlp
    .readDoc(text)
    .tokens()
    // Use only words ignoring punctuations etc and from them remove stop words
    .filter((t) => t.out(its.type) === 'word' && !t.out(its.stopWordFlag))
    // Handle negation and extract stem of the word
    .each((t) =>
      tokens.push(t.out(its.negationFlag) ? '!' + t.out(its.stem) : t.out(its.stem))
    );
  return tokens;
};

// Define the pre-processing task for the classifier
nbc.definePrepTasks([prepTask]);

// Configure behavior
nbc.defineConfig({ considerOnlyPresence: true, smoothingFactor: 0.5 });

// Read the file and train the classifier
const nounPhrases = readAndPreprocessFile('noun_phrases.txt');
nounPhrases.forEach((np) => nbc.learn(np, 'nounPhrase'));

const verbPhrases = readAndPreprocessFile('verb_phrases.txt');
verbPhrases.forEach((vp) => nbc.learn(vp, 'verbPhrase'));

nbc.consolidate();

// The classifier is now trained and can be used to make predictions.


console.log( nbc.predict( 'failing stars' ) );

The expected output is verbPhrase.

Tuesday, February 21, 2023

Using WinkNLP in JavaScript

This is not meant to be a guide. But for poets interested in using NLP libraries, this may help you get started.

While you can use Python, I use JavaScript. It is possible to use WinkNLP in a browser, but it must be built using browserify as it's a Node.js extension. I'm having issues getting the dependency tree established and thus, putting any of what I've created in a browser is out of reach. However, anyone who could build a proper dependency tree could browserify anything I made. It's not presently worth the effort. I'm just learning myself.

What's of interest to poets? Right now, I'm building code that performs extractions on text. This code, which was built entirely from the very useful documentation you can find on Wink's page, and some ChatGPT, will extract noun phrases of any type that include determiners and adjectives. It looks like this:

const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const fs = require('fs');

const nlp = winkNLP(model);

//const text = 'The quick brown fox jumps over the lazy dog.';
const text = fs.readFileSync('input.txt', 'utf-8');
const patterns = [
  {
    name: 'nounPhrase',
    patterns: [ '[|DET] [|ADJ] [NOUN|PROPN]' ]
  }
];

nlp.learnCustomEntities(patterns);
const doc = nlp.readDoc(text);
const entities = doc.customEntities().out();

// create a new file and write the entities to it
fs.writeFile('output.txt', entities.join('\n'), (err) => {
  if (err) throw err;
  console.log('The output has been written to output.txt');
});

I ran it through The Book of the Damned and it produced this:

fictions
the dissipation
entropy
an exorcism

For the entire document.

How is this useful?

There's a lot of reasons this could be useful. However, one of the biggest impediments to training a decent language model is preparing the data. While it's perfectly possible to throw it all in at once, it's not going to know what to do with it. However, if you want to train an AI on noun phrases, that would make more sense than sending through a ton of raw data. The AI has no intelligence at all. It has no idea what to do with that data. So preparing it is necessary, tedious, and this process makes it a lot easier.

It would also be useful for all different types of research.

How can I get started?

If you're starting with no knowledge of programming, this is probably a bad place the start. You should at least familiarize yourself with JavaScript. There are free online courses available to everyone.

Those with some programming knowledge will need to install Node.js. If you have Node.js installed, then you should be able to follow the guidelines on the Wink page to set up a WinkNLP environment.

I did not have any knowledge of Node.js prior to a week ago, but was able to get everything up and running without too much of a problem.

So, you need to install:

And the program above will allow you to extract noun phrases. Next up, I'll be trying to extract other phrases. There are no examples, so I'm going to have to figure it out myself. Let's see if ChatGPT can help.

Friday, February 17, 2023

Ask Ezekiel 2.0

Ezekiel Prophecy Generator

How to put an ngram generator in a blog post

<!DOCTYPE html>

<html>

<head>

<title>Ngram Generator</title>

</head>

<body>

<h1>Ngram Generator</h1>

<button onclick="generateText()">Generate Text</button><br>

let markovChain = {};

function eatText() {

let text = document.getElementById("output-text").value;

let words = text.match(/\b\w+\b|[^\s\w]+|\t|\r|\n/g);

for (let i = 0; i < words.length - 1; i++) {

let currentWord = words[i];

let nextWord = words[i + 1];

if (!markovChain[currentWord]) {

markovChain[currentWord] = [];

}

markovChain[currentWord].push(nextWord);

}

document.getElementById("output-text").value ="";

}

function generateText() {

let numWords = parseInt(prompt("How many words do you want to generate?"));

let words = Object.keys(markovChain);

let word = words[Math.floor(Math.random() * words.length)];

let result = [];

for (let i = 0; i < numWords; i++) {

result.push(word);

let newWords = markovChain[word];

word = newWords[Math.floor(Math.random() * newWords.length)];

if (!word) {

word = words[Math.floor(Math.random() * words.length)];

}

document.getElementById("output-text").value = result.join(" ");

}

</script>

</body>

</html>

Ngram Generator

Thursday, February 16, 2023

Letheberries

Hello

Pages