Text Analysis and the Humanities

This is a Workshop introducing Humanities Students to the Main Concepts of Computational Text Analysis. In here you can find material and code to analyse the Scottish Statistical Accounts Data and see computational method in practice.

Content of the Repository

The repository contains both the code and the material you are going to need to play around with the dataset. If you only want to visualise the notebooks without running its content you can visualise it at this address and this address.

Prerequisites

Some basic familiarity with Python Notebooks is assumed, but no previous knowledge of text analysis is required.

Learning Outcomes

Working knowledge of principal methods to clean and wrangle text-based data
Awareness of the pre-processing steps of text analysis, such as tokenisation and stop-words removal
Working knowledge of basic techniques of text analysis, such as word frequencies and keywords in context
Awareness of unstructured data visualisation techniques

How to use it

1. On Noteable

Open the following link in a new tab https://noteable.edina.ac.uk/login
Log in with your EASE credential
Select Standard Python 3 and press start
On the top bar, go to Git> Clone a Repository
Enter the link to this repo 'https://github.com/DCS-training/Text-Analysis-and-the-Humanities'
Press Clone
On the left side, you can now see a folder with the same name as the repo with a series of files within it. Double click on the 'NotebookTextAnalysis.ipynb'

2. Run the notebooks via Google Colab

Open Google Colab: https://colab.research.google.com
If you are not already logged in, you will be prompted to log in via Gmail
Go to the GitHub header, click on Open Colab
Select GitHub
Enter the link to this repo 'https://github.com/DCS-training/Text-Analysis-and-the-Humanities'
Click on 'NotebookTextAnalysis.ipynb' to open the notebook

Using the Notebook

The Notebook contains paragraphs of explanatory text interspersed with grey cells containing code blocks. To run a code block and see the result:

Place your cursor within the cell
Click the 'Run' button on the top menu
The results of running this code will appear below
If the results don't appear immediately, check the icon in the browser tab. AN egg-timer icon indicates it is processing the code.
It is best to follow the Notebook from top to bottom, as some code blocks will depend on results from previous cells
You can edit code blocks yourself and run them to see the results of your changes

Data

The data in here has been collected from the Scottish Statistical Accounts Data https://collectionsmanager.is.ed.ac.uk/handle/10683/119269 and the National Records of Scotland https://www.nrscotland.gov.uk/statistics-and-data/geography/our-products/other-national-records-of-scotland-nrs-geographies-datasets/historic-civil-parishes-pre-1891.

The Statistical Accounts of Scotland are a series of documentary publications, related in subject matter though published at different times, covering life in Scotland in the 18th and 19th.

The Old (or First) Statistical Account of Scotland was published between 1791 and 1799 by Sir John Sinclair of Ulbster. The New (or Second) Statistical Account of Scotland was published under the auspices of the General Assembly of the Church of Scotland between 1834 and 1845. These first two Statistical Accounts of Scotland are unique records of life during the agricultural and industrial revolutions in Europe.

Data Structure

The original publication has been scanned and OCRed and each single record has been collected in a .txt file. The name of each file contains information about the document itself. For example, StAS.2.15.91.P.Orkney.Cross_and_Burness

StAs.2.15.91 -> Second Statistical Account
P -> Parish (Contain information from the Parish)
Orkney -> Area of interest (Scotland has been divided in 33 Areas)
Cross_and_Burness -> Parish

We are going to see how to use this to extract information about all our text later, but the first thing we need to do is to create a single dataframe (table) that will contain all the texts otherwise, it will be very difficult to manage the data.

NB The data are hosted on a separate repository https://github.com/DCS-training/EFIDeepDive. This is to keep the cloning of this repo for teaching faster. We are going to import the dataset in our environment directly via the notebook.

Licence of the material

All the material collected here is covered by a CC-BY-NC 4.0 License

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
TextAnalysisAndTheHumanities.ipynb		TextAnalysisAndTheHumanities.ipynb
TextAnalysisHumanities.pdf		TextAnalysisHumanities.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analysis and the Humanities

Content of the Repository

Prerequisites

Learning Outcomes

How to use it

1. On Noteable

2. Run the notebooks via Google Colab

Using the Notebook

Data

Data Structure

Licence of the material

About

Releases

Packages

Languages

DCS-training/Text-Analysis-and-the-Humanities

Folders and files

Latest commit

History

Repository files navigation

Text Analysis and the Humanities

Content of the Repository

Prerequisites

Learning Outcomes

How to use it

1. On Noteable

2. Run the notebooks via Google Colab

Using the Notebook

Data

Data Structure

Licence of the material

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages