CS917留学生讲解、辅导Python编程语言

2019.10.30 - 首页 >> Python编程

CS917 Coursework 1

Deadline: Monday week 6 (4 November) at 12.00 noon. Please read the entire sheet before

starting on Part A.

Background

Through lectures and exercise sheets you have gained some useful experience of the

multi-paradigm programming language Python. In this coursework we would like you

to use this knowledge to solve a number of real-world problems based on the analysis

of company stock data.

The dataset that you will be using is taken from the Financial Times Stock Exchange

100 Index (the FTSE 100). This lists the 100 companies registered on the London Stock

Exchange with the highest market capitalisation. The FTSE 100 is seen as a gauge of

the prosperity of businesses regulated by UK company law and there is much interest

(and speculation) in the volatility of this index as the UK moves through various

BREXIT scenarios.

Data

The data that you need for this coursework is in a single CSV file called ftse100.csv,

which can be found on the module website. Within the CSV file there is data on the

value of stock prices for each company on the FTSE 100. Each row of the data file is

formatted as follows:

1. ‘date’: the day of the stock information, format dd/mm/yyyy

2. ‘time’: the time of the stock information, format hh:mm

3. ‘code’: The FTSE100 company code

4. ‘name’: Full name of the company

5. ‘currency’: The currency the company is valued in

6. ‘price’: The valuation of the stock at the date and time

7. ‘diff’: The price difference between the current and last recorded value

8. ‘per_diff’: The percentage difference between the current and last recorded

value

The data has been collected in 15 minute intervals from Monday 14 October to Friday

18 October.

Part A (25 Marks)

In this first part you are required to define 5 functions.

The first of these functions is

daily_movement(data, code, date) -> float

which given the data, a stock code, and a date (in the format dd/mm/yyyy) will return

a positive or negative floating point number (to 2 decimal places) that is the price

movement of that stock on that date.

The second function

daily_high(data, code, date) -> float

which given the data, a stock code, and a date will return a positive floating point

number (to 2 decimal places) that is the highest price for that stock on the given date.

The third function

daily_low(data, code, date) -> float

provides the equivalent functionality to that above but now returns the lowest price

for that stock on the given date.

The fourth function

daily_avg(data, code, date) -> float

provides the average price (to 2 decimal places) for a given stock on a given date.

Finally, the fifth function

percentage_change(data, code, date) -> float

should return the percentage change for a particular company code on a particular

date.

The code skeleton below can be found on the module website and you should add

your solutions to a copy of this file. If you use this code skeleton then marking should

be very easy using our ready-made test harness.

Skeleton files for all parts are provided. Below is an example of the skeleton code

provided for part A:

"""

Part A

Please provide definitions for the following functions

"""

# daily_movement(data, code, date) -> float

# daily_high(data, code, date) -> float

# daily_low(data, code, date) -> float

# daily_avg(data, code, date) -> float

# percentage_change(data, code, date) -> float

# Replace the body of this main function for your testing purposes

if __name__ == "__main__":

# Start the program

# Example variable initialization

# data is always the ftse100.csv read in using a DictReader

data = []

with open("ftse100.csv", "r") as f:

reader = csv.DictReader(f)

data = [r for r in reader]

# code is always a string value

code = "III"

# date is always a string formatted 'dd/mm/yyyy'

date = "14/10/2019"

# access individual rows from data using list indices

first_row = data[0]

# to access row values, use relevant column heading in csv

print(f"code = {first_row['code']}")

print(f"price = {first_row['price']}")

print(f"date = {first_row['date']}")

pass

Details of the marking scheme that we will use can be found below.

Part B (25 Marks)

In this exercise you are first expected to create a portfolio. This is simply a list of

company codes which a particular trader is interested in. You must ensure that each

code in the portfolio is a valid FTSE 100 company (of course), and a portfolio must

contain at least 1 and at most 100 companies.

create_portfolio() -> [code]

To implement this function, you will need to ask for inputs from the user using the

input functions from previous labs. Each input should ask for a single company code.

When all the company codes have been entered for this portfolio, input ‘EXIT’ to exit

the input loop, and then return the portfolio.

Next you are required to write a function that will find the best x investments in a

portfolio for a particular period:

best_investments(data,portfolio,x,start_date,end_date) -> [code]

The function should take the following parameters: The FTSE 100 data; The portfolio

in question (which is a list of company codes); the number of investments that the

trader is interested in (this must be a number between 1 and the number of

companies in the portfolio inclusive); and a start and end date, each in the format

dd/mm/yyyy.

There are a lot of opportunities for error here: x must be less than or equal to the

portfolio size; the start date must be less than the end date etc. In all cases where it is

not possible to return a valid answer, or where you think the function should return

an exception, your function should return an empty list of codes, e.g. [ ].

In a similar fashion, now define the function:

worst_investments(data,portfolio,x,start_date,end_date) -> [code]

The parameters are to be defined in the same way as above, and the function should

return [ ] in all scenarios where it is not possible to calculate a valid answer.

A code skeleton for Part B can also be found on the module website, so please add

your solutions to this. Details of the marking scheme that we will use can be found

below.

Part C (25 Marks)

As Data Analysts, we should be capable of interrogating and understanding our data.

Many of us, however, find it quite difficult to understand hundreds or possibly

thousands of lines of csv. To help with this one might want to visualise the data in a

graph.

In this exercise, you will be utilising the matplotlib.pyplot library to visualise the

trends of the stock prices in the data file. You will again find a code skeleton on the

module website which you should use when developing your solutions.

You need to implement two functions for Part C.The first function is

plot_company(data, code, start_date, end_date)

Which, given the code of a particular company and a start and end date (in the format

dd/mm/yyyy), will output a line graph of the stock over the time period from the start

date to the end date. The function should not return a value, but should instead

output the plot to a file called plot1.png.

The second function

plot_portfolio(data, portfolio, start_date, end_date)

is similar to the first function, except that it should plot lines for each stock in the

portfolio. Due to the large variance in share prices, it may not be feasible to plot some

companies on the same graph. Luckily for us the matplotlib package comes with a

subplot function, allowing us to plot multiple graphs in the same figure.

Using the subplots function, create and save a plot containing multiple graphs. To

make coding easier, the maximum number of subplots expected within one figure

will be 6.

The function should not return a value, but should instead output the plot to a file

called plot2.png.

Each graph generated by plot_company and plot_portfolio should have a title,

a legend of all of the codes present in the graph, suitable scales for both axes, and

labelled axes with units identified.

Part D (25 Marks)

In this exercise you are required to create a new Company class. This class will

encapsulate the code you created in Part A so that Company objects can be created.

Each Company will have a data variable which will store entries from ftse100.csv

related to this company. The specification for the new class is as follows:

Company:

#Instance variables

#Functions

● daily_movement(date) -> float

● daily_high(date) -> float

● daily_low(date) -> float

● daily_avg(date) -> float

● percentage_change(date) -> float

If we want to meaningfully participate in the stock market, it is not enough to just

analyse data, we also need to be able to predict future stock prices for a given

company. To do this we will implement a linear regression model to identify the trend

and predict the next day’s stock prices.

There are various maths libraries in Python that might help you do this, but we would

like you to implement this feature by hand. Define a function called

predict_next_average, which takes an instance of the Company class, calculates

the average price for the 5 days of data in ftse100.csv and uses this to predict what

you think the average price will be for the next day of trading.

predict_next_average(company) -> float

For those not familiar with linear regression models, the algorithm to generate a

simple linear model is available below. In this algorithm, m is the gradient of a

straight line and b is the y intercept. Applying this model to our dataset, for our needs,

you will need to assign x to the day (Monday is day 0, Tuesday is day 1 etc.) and y is

the average price.

The resulting model should result in y=mx+b which will generate a straight line from

which we can extrapolate the price of day 5 in our sequence (which in reality would

be the following Monday, as the FTSE 100 is closed over the weekend).

For many analysis techniques, it is not enough to predict the next stock prices or

averages. Instead we will want to classify companies based on how the stocks have

evolved over the past 5 days. Next implement a function that will return a string

classifier that will identify if a stock’s value is ‘increasing’, ‘decreasing’, ‘volatile’ or

‘other’:

classify_trend(company) -> str

To do this, perform a linear regression on the daily high and daily low for a

given company and determine whether the highs and lows are increasing or

decreasing. You will most likely need to use the linear regression algorithm you

implemented for predicting the average price, so it may be useful to make the

regression its own function.

The classification system works as follows: If the daily highs are increasing and the

daily lows are decreasing, this means that the stock prices have been fluctuating

over the past 5 days so assign ‘volatile’ to your result string. If the daily highs and

daily lows are both increasing, this likely means that the overall stock prices are

increasing so assign ‘increasing’. Likewise if the daily highs and lows are both

decreasing then assign ‘decreasing’. We currently only care about these 3

classifications so if a company shares do not follow any of the above trends assign

‘other’ to your result string.

The marking scheme for Part D can also be found below.

Coursework Submission and Marking

Deadline: Monday week 6 (4 November) at 12.00 noon. Coursework in the department is

nearly always submitted through Tabula. The advantage of using this system is that

you can be assured that the work has been submitted, a secure record of the work is

kept, feedback is easy to distribute and, where appropriate, code can be automatically

run and checked for correctness. Instructions on how to register on Tabular and the

steps to follow to submit your work will be posted on the module webpage shortly.

Please note the university requires that late penalties apply, so if you do submit your

work late (by even 5 minutes!) you will be penalised.

You are required to submit four separate files for this coursework: parta.py,

partb.py, partc.py, partd.py. Each of these files will be run on the FTSE 100 test data

so that it can be checked for correctness. We will also judge each solution on coding

style and how well you have made use of the various features of Python that we have

covered in the lectures.

The marking scheme for the coursework is as follows:

Part A

The parta.py file will have the following functions tested:

1. daily_movement(data, code, date) -> float

2. daily_high(data, code, date) -> float

3. daily_low(data, code, date) -> float

4. daily_avg(data, code, date) -> float

5. percentage_change(data, code, date) -> float

Each function will be tested on four (code, date) combinations and checked against

our own solutions. Thus twenty tests will be run in total for Part A and there are 20

possible marks available for these tests. In addition, marks will be available for

coding style (2 marks) and how well you have made use of the various language

features of Python (3 marks).

In your feedback for this exercise you will be told how many of the twenty tests your

code passes, and also how many marks were awarded for coding style and use of

language features.

Part B

The partb.py file will have the following functions tested:

1. create_portfolio() -> [code]

2. best_investments(data,portfolio,x,start_date,end_date) ->

[code]

3. worst_investments(data,portfolio,x,start_date,end_date) ->

[code]

All three functions will be tested to ensure they follow the specification. This includes

scenarios where the functions should return [] as an answer. This means that your

solution should be able to handle incorrect input (as outlined above) as well as input

that will yield a result.

create_portfolio is worth 4 marks and will be subject to 4 tests which will test its

ability to handle both correct and incorrect inputs.

best_investments and worst_investments will be subject to 8 tests each (i.e. 8

marks for each function with 1 mark per test). These tests will consist of both valid

and invalid inputs. For these tests, the portfolios supplied will always be valid,

however the value of x, and the start_date and end_date may be invalid and

therefore require a [] output.

Like Part A, 2 marks will be assigned for coding style and 3 marks for the use of

python language features.

Part C

For Part C, the code in your partc.py file will be run with 5 sets of test input. The

plot_company function will be tested with two valid FTSE 100 company codes; the

plot_portfolio function will be tested with three valid test portfolios. The

resulting graphs for each of the five tests (in plot1.png and plot2.png) will be

visually inspected and marks will be allocated for each of the following features:

● The correctness of the graph (1 mark)

● The graph has a title (1 mark)

● The x and y axis both have labels and units (1 mark)

● A legend to identify the data (1 mark)

● That appropriate scales are calculated for the x and y axis (1 mark)

Thus your code will be expected to generate 5 graphs, each of which is worth 5 marks.

Part D

The partd.py file will have the following functions tested:

1. company.daily_movement(date) -> float

2. company.daily_high(date) -> float

3. company.daily_low(date) -> float

4. company.daily_avg(date) -> float

5. company.percentage_change(date) -> float

The implementation of the Company class is worth 5 marks. Therefore each of these

required class functions will be allocated one mark each.

We will then test the two functions:

1. predict_next_average(company) -> float

2. classify_trend(company) -> str

The predict_next_average function will be tested on 5 random companies from

the FTSE 100. Each correct answer will result in 2 marks each; thus 10 marks in total.

The classify_trend function will be tested on 5 random companies, but not

necessarily the same as those above, and each test will be worth 2 marks; total of 10

marks.

Please note that no marks will be assigned for solutions that use additional

imported libraries not covered in the lectures to solve these questions.

Working Independently and Completion

It is important that you complete this coursework independently. Warwick has quite

strict rules about this, and if a submitted solution is deemed to be a copy of another

then there are harsh penalties. The Tabula system has built-in checking software that

is quite sophisticated: if variable names are changed but the structure of the code is

the same, then it will spot this; if you reorder the code, then it will spot this also; if you

change the comments, then the checking software will know. So rather than trying to

fool the system - which you will not be able to do - it is best just to complete the work

by yourself. You do not need to do all the questions to get a decent mark, so just do

what you can on your own … and then you will do fine.

This coursework is designed to have stretch goals. Part D in particular is quite

complicated and we are not expecting everyone to do this. If you are able to get Parts

A-C done well, then you will likely score between 65% and 75%, which is a good mark

which you should be pleased with.

Good luck.