Sunday 16 April 2017

How I came to scrape the world, and, just what have I walked into?

<=== WATCH THIS SPACE : BLINK, BLINK...   

      I AM A WORK-IN-PROGRESS!           ===> 



Well, how did I land up harvesting data?

2008 - "coming out" on the beach
2009 - I took up maths with the Open University
2008 - 2013 - faffed about building some websites, learning a bit of machine learning and the likes
2013 - 2015 - I got my first hacking job and dreamed of automating the world. And you ask, how does that work?



2015 - 2017 - Freelancing

What next? AI + CRAWLING == INNOVATION!





Thursday 23 June 2016

Natural Language Processing - learning NLTK from the NLTK book....

Psst: this article is going to grow, and reflect the topics I'm learning, as I'm learning them. So please keep referring back for new content! Constructive comments are also welcome!


NLTK resources


the book



Some important NLTK modules include the following:


coming soon!



The beginning of a list of NLP problems:


coming soon!




WHAT I LEARNED TODAY...



17/6/2016



#1 Bring back a list of occurrences of a keyword, including context - i.e. the rest of the text around it.

Just enter something like the following:


>>> text3.concordance("lived")


#2 Bring back a list of words which occur in a similar context to one another. For example


>>> text1.similar("monstrous") will bring back all words with a similar context to that of the key word "monstrous". Context is defined as words on either side of __keyword__.


#3 Bring back common contexts for two or more words. By 'contexts', we just mean the words on either side of a particular word we're interested in. See #2. So if we wanted to find out which contexts 'tea' and some other word have in common, we would first test for words with similar contexts:


>>> text2.similar("tea")

which brings back the following results:


all herself edward me them him last first family marianne her himselfcompany present same affair modesty coach gold show

Then we'd pick say, "modesty"


>>> text2.common_contexts(["tea","modesty"])



which brings back just the one result:


the_and


#4 We can determine the location of a word in the text

#5 So what is a "lexical dispersion plot"? You can display positional information using a dispersion plot.


>>> text4.dispersion_plot(["citizens","democracy","freedom",
"duties","America"])


does not work, so I had a look on stackoverflow and found this answer: dispersion_plot not working inspite of installing matplotlib. Then I entered:


>>> from nltk.draw.dispersion import dispersion_plot

>>> text4.dispersion_plot(text1, ['monstrous'])



We have a problem, alas. Can't seem to see the plot!

Friday 17 June 2016

Making Up For Lost Time

Or... the day in the life of a Freelance Software Engineer...

17/6/2016 

It has been such a long while since I added to this blog, I thought, since I'm somewhat rusty, I'd set aside a page to use as a "rough book". (Did you have to use rough books at school too?) This could be thought of as a non-linear approach to writing an index to articles I will write, and a diary of article entries as and when I think of them.

#1 Achieving the Linux/Ubuntu dual boot on an Acer laptop, or... what you'll need to learn in order to get back data from a lost windows 10 operating system, gobbled up by Ubuntu...

#2 Three years coding and working in data

#3 Navigating the Open Source Community on GitHub, and how to contribute








Friday 31 August 2012

HTML Editors: Another P2PU Challenge!

So...

In the end, I opted for simplicity combined with versatility. I found myself feeling comfortable with two editors: Free JavaScript Editor and jEdit- neither especially built for HTML, but both with a facility for saving HTML documents and recognising all the various components by making them pretty colours! 

jEdit is the most versatile in terms of format, however, I like the Free JavaScript Editor, because it works with CSS and JavaScript too- and this is how I will be working with HTML. Plus, I love the facility for debugging Javascript code!

My first P2PU challenges

First encounter with P2PU challenges...


I discovered P2PU through wonderful google. Joining P2PU is part of my own personal challenge to gain fluency in  programming, and to eventually get a job doing it!

The first challenge I took part in was to introduce myself, hence the existence of this blog: 'Seraphina's Scratch Pad'. Challenge #2 was to copy some html by hand. See below

I wrote it myself by hand. #2 Writing HTML by Hand
I had to find some JavaScript challenges too. So I discovered What is JavaScript, and decided to dive in straight away!

To be continued....

Hello World by Seraphina

How it looks, typed out normally.


<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Hello World by Seraphina</title>
  </head>
<body>
    <h1>Code by fledgling programmer!</h1>
    <p>
    Code written in my own words...
    </p>
</body>
</html>


...and translated in English:
Hello World by Seraphina

Code by fledgling programmer!

Code written in my own words...


My first steps



Hello everyone, and welcome to my blog!



Here's a little about myself...



I've been making avant-garde films for ten years, and recently took up a maths degree at the Open University. It seemed natural that every mathematician should learn to program, so around a year ago I started learning HTML, CSS and JavaScript. More recently, I discovered free Stanford online courses- AI and Machine Learning being two I completed, and I'm currently learning Python through Udacity.



The P2PU "Hello World" challenge


I shall provide some tasty links during my next free moment- but must dash!