CSC108H|数据分析|Python代写|DATA ANALYSIS|Python|数据处理

编写一个程序来分析Poetry,计数音节和寻找韵律。

Poetry

Introduction

In this assignment, you will write a program to analyze poetry, counting syllables and looking for rhymes.
This handout explains the problem you are to solve and the tasks you need to complete for the assignment. Please read it carefully.

Goals of this Assignment

  • Write function bodies using dictionaries and file reading.
  • Write code to mutate lists and dictionaries.
  • Use top down design to break a problem down into subtasks and implement helper functions to complete those tasks.
  • Write tests to check whether a function is correct.

Files in the download

Please download the Assignment 3 files and extract the zip archive.

  • Starter code:
    • poetry_reader.py and poetry_functions.py
      These are the only files you need to modify and submit. These two files contain the headers for the functions you will need to write for this assignment, and a few completed function docstrings. Many of these functions will be called by the main program ( poetry.py ). You can, and should, write some helper functions in this file. Your lives will be easier if you do.
  • Helper module: poetry_constants .py
    Read this! This file contains several definitions of new types that we use in the function type annotations.
  • Main Program: poetry.py
    Run this first. The file contains a program that calls the functions in the starter code files. You can run it now, although it won’t work properly until you complete the functions in the starter files. Still, you’ll be able to use this to check your progress.
  • Data: poetry/*.txt
    In the poetry directory are several files containing poems that you can use to test your code.
  • Data: dictionary.txt
    This file contains a huge list of English words and their pronunciations.
  • Data: poetry_forms.txt
    This file contains information describing various poetic forms.
  • Checker: a3_checker.py
    We have provided a checker program that you should use to check your code. See below for more information about a3_checker.py .

Poetry Forms

Poetry differs from prose because it has a fixed structure. Different forms of poetry, such as sonnets and haiku, have rules about which words must rhyme and the number of syllables in each line.

In this assignment, you will write a program to read a poem from a file, figure out the pronunciation, count the number of syllables in each line, and determine which lines rhyme.

Some poetry forms specify the number and order of stressed and unstressed syllables within a line. We will not consider syllabic stress in this assignment.

Some poetry forms specify that particular words must alliterate, or start with the same sound. We will not consider alliteration in this assignment.

Denitions

All links go to https://dictionary.com (https://dictionary.com/) .

  • poem (https://www.dictionary.com/browse/poem) a composition in verse, especially one that is characterized by a highly developed artistic form and by the use of heightened language and rhythm to express an intensely imaginative interpretation of the subject
  • rhyme (https://www.dictionary.com/browse/rhyme) a word agreeing with another in terminal sound: Find is a rhyme for mind and womankind
    consonant (https://www.dictionary.com/browse/consonant) (in English articulation) a speech sound produced by occluding with or without releasing (p, b; t, d; k, g), diverting (m, n, ng), or obstructing (f, v; s, z, etc.) the flow of air from the lungs (opposed to vowel)
  • vowel (https://www.dictionary.com/browse/vowel) (in English articulation) a speech sound produced without occluding, diverting, or obstructing the flow of air from the lungs (opposed to consonant)
  • syllable (https://www.dictionary.com/browse/syllable) an uninterrupted segment of speech consisting of a vowel sound, a diphthong, or a syllabic consonant, with or without preceding or following consonant sounds

There are many vowel sounds. For example, freight, fraught, fruit, and fright all are different vowel sounds there are far more vowel sounds than there are letters used to describe them: a, e, i, o, u, and sometimes y.

Poetry Form Example: Limerick

Here is a stupendous work of limerick art. The lines have been numbered and we have highlighted the last syllable of each line, because those words must rhyme according to a particular scheme. We have indicated, using bold and underlined italics, the two sets of rhyming words.

  1. I wish I had thought of a rhyme 2. Before I ran all out of time!
  2. I’ll sit here instead,
  3. A cloud on my head
  4. That rains ‘til I’m covered with slime.

Limericks are five lines long. Lines 1, 2, and 5 have eight syllables and the last syllables on these lines rhyme with each other. Lines 3 and 4 have five syllables and the last syllables rhyme with each other. (There are additional rules about the location and number of stressed vs. unstressed syllables, but we’ll ignore those rules for this assignment; we will be counting syllables, but not paying attention to whether they are stressed or unstressed.)

The CMU Pronouncing Dictionary

We’ll need a way to examine words and break them into syllables and consonants. We’re going to use the Carnegie Mellon University Pronouncing Dictionary (http://www.speech.cs.cmu.edu/cgi-bin/cmudict) , which contains a dictionary where instead of definitions they store pronunciations. They use a plain-text notation for various sounds; the quickest way to get used to them is to go look at some. You don’t need to memorize the notation, but it helps to see it. Head to the CMU Pronouncing Dictionary (http://www.speech.cs.cmu.edu/cgi- bin/cmudict) now and look up a couple of words; try searching for words like , , and , and see if you can interpret the results. Do contractions like (short for ) and (short for ) work? What about possessives like “Rita’s”?

Now click the “Show Lexical Stress” checkbox and see how that changes the results.

Here is the output for David (with “Show Lexical Stress” turned on): D EY1 V IH0 D . There are five phonemes in the word David and each phoneme describes a sound. The sounds are either vowel sounds or consonant sounds. We will refer to phonemes that describe vowel sounds as vowel phonemes, and similarly for consonants.

The phoneme notation was defined in a project called Arpanet (http://en.wikipedia.org/wiki/Arpabet) that was created by the Advanced Research Projects Agency (ARPA) (http://en.wikipedia.org/wiki/Advanced_Research_Projects_Agency) back in the 1970’s.

We have downloaded a text file containing the CMU Pronouncing Dictionary: all the words and their pronunciations. All vowel phonemes end in a 0 , 1 , or 2 , with the digit indicating a level of syllabic stress. Consonant phonemes do not end in a digit. The number of syllables in a word is the same as the number of vowel sounds in the word, so you can determine the number of syllables in a word by counting the number of phonemes that end in a digit.

As an example, in the word “secondary” ( S EH1 K AH0 N D EH2 R IY0 ), there are 4 vowel phonemes, and therefore 4 syllables. The vowel phonemes are EH1 , AH0 , EH2 , and IY0 .

In case you’re curious, 0 means unstressed, 1 means primary stress, and 2 means secondary stress try saying “secondary” out loud to hear for yourself which syllables have stress and which do not. (In this assignment, your program will not need to distinguish between the levels of syllabic stress.)

The assignment zipfile includes dictionary.txt , which contains our version of the Pronouncing Dictionary. You must use this file, not any files from the CMU website, because our version differs slightly from the CMU version. We have removed alternate pronunciations for words, and we have removed words that do not start and end with alphanumeric characters (like #HASH-MARK , #POUND-SIGN and #SHARP-SIGN ). Open up dictionary.txt file to see the format; notice that any line beginning with ;;; is a comment.
The words in dictionary.txt are all uppercase and do not contain surrounding punctuation. When your program looks up a word, use the uppercase form, with no leading or trailing punctuation. Function clean_up in the starter code file poetry_functions.py will be helpful here.

Describing Poetry Forms

Here is our limerick poetry form:

Limerick
8 A
8 A
5 B
5 B
8 A

On each line, the first piece of information is a number that indicates the number of syllables required on that line of the poem. The second piece of information on each line is a letter that indicates the rhyme scheme. Here, lines 1, 2, and 5 must rhyme with each other because they’re all marked with the same letter ( A ), and lines 3 and 4 must rhyme with each other because they’re both marked with the same letter ( B ). (Note that the choice to use the letters A and B was arbitrary. Other letters could have been used to describe this rhyme scheme.)

Two lines of a poem rhyme with each other when the last syllable of the last word on each of the two lines rhyme. Two syllables rhyme when their vowels are the same and they end in the same sequence of consonant phonemes, like goshand wash.

Some poetry forms don’t require lines that rhyme. For example, the haiku form has 5 syllables in the first line, 7 in the second line, and 5 in the third line, but there are no rhyme requirements. Here is an example:

Dan's hands are quiet.
Soft peace surrounds him gently:
No thought moves the air.

And another one:

Jen sits quietly,
Thinking of assignment three.
All ideas bad.

We’ll indicate the lack of a rhyme requirement by using the symbol * . Here is our poetry form description for the haiku poetry form:

Haiku
5 *
7 *
5 *

Some poetry forms have rhyme requirements but don’t have a specified number of syllables per line. Quintain (English) is one such example; these are 5-line poems with an ABABB rhyme scheme, but with no syllable requirements. Here is our poetry form description for the Quintain (English) poetry form (notice that 0 is used to indicate that there is no requirement on the number of syllables in the line):

Quintain (English)
0 A
0 B
0 A
0 B
0 B

Here’s an example of a Quintain (English) from Percy Bysshe Shelly’sOde To A Skylark:

Teach us, Sprite or Bird,
What sweet thoughts are thine:
I have never heard
Praise of love or wine
That panted forth a flood of rapture so divine.

Your program will read a poetry form description file containing a list of poetry form names and their poetry form descriptions. For each poetry form in the file:

  • the first line gives the name of the poetry form
  • subsequent lines contain the number of syllables and rhyme scheme for each line of poetry
  • each poetry form is separated from the next by a blank line

The poetry form names given in a poetry form description file are all different.

We have provided poetry_forms.txt as an example poetry form description file. We will test your code with other poetry form descriptions as well.

Stanza-based poetry

Many poetry forms don’t have a fixed number of lines. Instead, they specify what a stanza looks like, and then the poetry is made up of as many stanzas as the poet likes.

As an example drawn from Narodnaya Volya literature, here are the first two stanzas of a poem called The Beauteous Terrorist. The author, Henry Parkes, was inspired by Sophia Perovskaia, a prominent member of the Narodnaya Volya, to write the poem. Each stanza follows a simple ABAB rhyme scheme.

SOFT as the morning's pearly light,
Where yet may rise the thunder cloud,
Her gentle face was ever bright
With noble thought and purpose proud.

Dreamt ye that those divine blue eyes,
That beauty free from pride or blame,
Were fashioned but to terrorize
O'er Despot's power of sword and flame?

We will not consider stanza-based poems in this assignment.

Data Representation

We use the following Python definitions to create new types relevant to the problem domain. Read the comments in starter code file poetry_constants.py for detailed descriptions with examples.

DomainType
POETRY_FORMTuple[List[int], List[str]]
POETRY_FORMSDict[str, POETRY_FORM]
CLEAN_POEMList[List[str]]
WORD_PHONEMESList[str]
LINE_PRONUNCIATIONList[WORD_PHONEMES]
POEM_PRONUNCIATIONList[LINE_PRONUNCIATION]
PRONOUNCING_DICTIONARYDict[str, WORD_PHONEMES]

A note on StringIO

So far in this course, we have been using TextIO to read and write files. StringIO works a lot like TextIO, but input comes from a String rather than from a file It has all the built-in functions that we have used using TextIO, including read(), readlines() and etc. For a comprehensive list, feel free to call help on StringIO in python!
For example:

>>> from io import StringIO
>>> test_string = "1\n2\n3"
>>> print(test_string)
1
2
3
>>> string_io = StringIO(test_string)
>>> for line in string_io.readlines():
>>> print(line.strip())
1
2
3

So why are we using this?
In situations where an IO object is expected, rather than creating a new file, writing text, and closing it, we can directly pass in a string! There are some other differences that are beyond the scope of this course. If you are interested, you can read the official documentation on the python webpage listed below: https://docs.python.org/3.7/library/io.html#io.StringIO

Required Functions

This section contains a table with detailed descriptions of the functions that you must complete in the two starter code files. You’ll need to add a second example to the docstrings for each function in the starter code.

For all poetry samples used in this assignment, you should assume that all words in the poems will appear as keys in the pronouncing dictionary. We will test with other pronouncing dictionaries, but we will always follow this rule.

You should follow the approach we’ve been using on large problems recently and write additional helper functions to break these high-level tasks down. Each helper function must have a clear purpose. Each helper function must have a complete docstring produced by following the Function Design Recipe. You should test your helper functions to make sure they work!

A3 Checker

We are providing a checker module ( ) that tests two things:

  • whether your code follows the Python Style Guidelines, and
  • whether your functions are named correctly, have the correct number of parameters, and return the correct types.

To run the checker, open and run it. Be sure to scroll up to the top and read all messages.
If the checker passes for both style and types:

  • Your code follows the style guidelines.
  • Your function names, number of parameters, and return types match the assignment specification. This does not mean that your code works correctly in all situations. We will run a different set of tests on your code once you hand it in, so be sure to thoroughly test your code yourself before submitting.

If the checker fails, carefully read the message provided:

  • It may have failed because your code did not follow the style guidelines. Review the error description(s) and fix the code style. Please see the PyTA documentation for more information about errors.
  • It may have failed because:
    • you are missing one or more function,
    • one or more of your functions is misnamed,
    • one or more of your functions has the incorrect number or type of parameters, or
    • one of more of your function return types does not match the assignment specification.

Read the error message to identify the problematic function, review the function specification in the handout, and fix your code. Make sure the checker passes before submitting.

Running the checker program on Markus

In addition to running the checker program on your own computer, run the checker on MarkUs as well. You will be able to run the checker program on MarkUs once every 12 hours (note: we may have to revert to every 24 hours if MarkUs has any issues handling every 12 hours). This can help to identify issues such as uploading the incorrect file.

First, submit your work on MarkUs. Next, click on the “Automated Testing” tab and then click on “Run Tests”. Wait for a minute or so, then refresh the webpage. Once the tests have finished running, you’ll see results for the Style Checker and Type Checker components of the checker program (see both the Automated Testing tab and results files under the Submissions tab). Note that these are not actually marks – just the checker results. If there are errors, edit your code, run the checker program again on your own machine to check that the problems are resolved, resubmit your assignment on MarkUs, and (if time permits) after the 24 hour period has elapsed, rerun the checker on MarkUs.

Testing your Code

It is strongly recommended that you test each function as soon as you write it. As usual, follow the Function Design Recipe (we’ve provided the function name and types for you) to implement your code. Once you’ve implemented a function, run it against the examples in your docstrings and the unit tests you’ve defined.

How to tackle this assignment

Principles

  • To avoid getting overwhelmed, deal with one function at a time. Start with functions that don’t call any other functions; this will allow you to test them right away. The steps listed below give you a reasonable order in which to write the functions.
  • For each function that you write, start by adding at least one example call to the docstring before you write the function.
  • Keep in mind throughout that any function you have might be a useful helper for another function. Part of your marks will be for taking advantage of opportunities to call an existing function.
  • As you write each function, begin by designing it in English, using only a few sentences. If your design is longer than that, shorten it by describing the steps at a higher level that leaves out some of the details. When you translate your design into Python, look for steps that are described at such a high level that they don’t translate directly into Python. Design a helper function for each of these high-level steps, and put a call to the helpers into your code. Don’t forget to write a great docstring for each helper!

Leave a Reply

Your email address will not be published. Required fields are marked *