This is not meant to be a guide. But for poets interested in using NLP libraries, this may help you get started.
While you can use Python, I use JavaScript. It is possible to use WinkNLP in a browser, but it must be built using browserify as it's a Node.js extension. I'm having issues getting the dependency tree established and thus, putting any of what I've created in a browser is out of reach. However, anyone who could build a proper dependency tree could browserify anything I made. It's not presently worth the effort. I'm just learning myself.
What's of interest to poets? Right now, I'm building code that performs extractions on text. This code, which was built entirely from the very useful documentation you can find on Wink's page, and some ChatGPT, will extract noun phrases of any type that include determiners and adjectives. It looks like this:
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const fs = require('fs');
const nlp = winkNLP(model);
//const text = 'The quick brown fox jumps over the lazy dog.';
const text = fs.readFileSync('input.txt', 'utf-8');
const patterns = [
{
name: 'nounPhrase',
patterns: [ '[|DET] [|ADJ] [NOUN|PROPN]' ]
}
];
nlp.learnCustomEntities(patterns);
const doc = nlp.readDoc(text);
const entities = doc.customEntities().out();
// create a new file and write the entities to it
fs.writeFile('output.txt', entities.join('\n'), (err) => {
if (err) throw err;
console.log('The output has been written to output.txt');
});
I ran it through The Book of the Damned and it produced this:
For the entire document.
How is this useful?
There's a lot of reasons this could be useful. However, one of the biggest impediments to training a decent language model is preparing the data. While it's perfectly possible to throw it all in at once, it's not going to know what to do with it. However, if you want to train an AI on noun phrases, that would make more sense than sending through a ton of raw data. The AI has no intelligence at all. It has no idea what to do with that data. So preparing it is necessary, tedious, and this process makes it a lot easier.
It would also be useful for all different types of research.
How can I get started?
If you're starting with no knowledge of programming, this is probably a bad place the start. You should at least familiarize yourself with JavaScript. There are free online courses available to everyone.
Those with some programming knowledge will need to install Node.js. If you have Node.js installed, then you should be able to follow the guidelines on the Wink page to set up a WinkNLP environment.
I did not have any knowledge of Node.js prior to a week ago, but was able to get everything up and running without too much of a problem.
So, you need to install:
And the program above will allow you to extract noun phrases. Next up, I'll be trying to extract other phrases. There are no examples, so I'm going to have to figure it out myself. Let's see if ChatGPT can help.