# Netherlands Accommodation Prices (FCG)
    
The Dutch housing crisis is one of the biggest problems that residents face.

Due to multiple factors, such as population growth and a shortage of construction workers, the availability of housing has decreased significantly. This decline has pushed rent to sky-high prices, which leaves many wondering whether they are being taken advantage of.

In order to answer this question, you are tasked to predict the rent of a house given its data (i.e. location, size, facilities etc.).

_Uncomment next cell to download the libraries._ We use a `!` before `pip` to run the command in the terminal instead of python. When using a computer locally, it is sufficient to execute the below line in the terminal.

In [1]:
# !pip install numpy
# !pip install pandas

In [2]:
import pandas as pd
import numpy as np

## Loading data

Download the csv data files from Kaggle and place it in a folder `./datasets/` in the same directory as this notebook. In python to load csv files we use a library called `pandas`. Pandas is a open source library created to handle the manipulation and storage of large scale datasets.

- Pandas is optimized for handling large scale datasets
- This optimization results in massive speedups when compared to other libraries like csv and pickle
- Pandas contains several useful features for Machine Learning such as:
    - Data cleaning
    - Data inspection
    - Statistical analysis
    - Data normalization
    - Loading and storing data


A CSV file is a text file which uses a comma to separate values. An example is provided below

```
id,title,city,postalCode
0,West-Varkenoordseweg,Rotterdam,3974HN
3,Ruiterakker,Assen,9407BG
8,Brusselseweg,Maastricht,6217GX
10,Donkerslootstraat,Rotterdam,3074WL
12,Vorselenburgstraat,Alphen aan den Rijn,2405XJ
```

While the above data might be difficult for a human to read at a first glance, machines can parse these files quickly. To read the file, we'll use a `read_csv` method.

Read more about the methonds avaliable in the [documentation](https://pandas.pydata.org/docs/).

In [3]:
# read the csv file from path, and index it by id
train = pd.read_csv('datasets/train.csv', index_col='id')

# show first 5 rows
train.head()

Unnamed: 0_level_0,title,city,postalCode,latitude,longitude,areaSqm,firstSeenAt,lastSeenAt,isRoomActive,rawAvailability,...,matchAge,matchGender,matchCapacity,matchLanguages,matchStatus,coverImageUrl,additionalCosts,rent,deposit,registrationCost
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,West-Varkenoordseweg,Rotterdam,3074HN,51.896601,4.514993,14,2019-07-14 11:25:46.511000+00:00,2019-07-26 22:18:23.142000+00:00,True,26-06-'19 - Indefinite period,...,16 years - 99 years,Not important,1 person,Not important,Not important,https://resources.kamernet.nl/image/913b4b03-5...,50.0,500,500.0,0.0
3,Ruiterakker,Assen,9407BG,53.013494,6.561012,16,2019-07-14 11:25:46.988000+00:00,2019-07-18 22:00:31.174000+00:00,False,16-06-'19 - Indefinite period,...,18 years - 32 years,Female,1 person,Not important,"Student, Working student",https://resources.kamernet.nl/image/84e95365-6...,,290,290.0,
8,Brusselseweg,Maastricht,6217GX,50.860841,5.671673,16,2019-07-14 11:25:47.814000+00:00,2019-08-10 00:14:27.130000+00:00,True,15-07-'19 - Indefinite period,...,16 years - 40 years,Male,4 persons,Dutch English,Student,https://resources.kamernet.nl/image/6e625591-d...,,425,425.0,25.0
10,Donkerslootstraat,Rotterdam,3074WL,51.893195,4.516478,25,2019-07-14 11:25:48.140000+00:00,2019-07-16 06:05:32.183000+00:00,False,01-08-'19 - Indefinite period,...,21 years - 99 years,Not important,4 persons,Dutch English Spanish French Italian German Po...,"Student, Working student, Working, Looking for...",https://resources.kamernet.nl/image/ea3aea77-0...,,600,1200.0,0.0
12,Vorselenburgstraat,Alphen aan den Rijn,2405XJ,52.122335,4.661434,10,2019-07-14 11:25:48.465000+00:00,2019-08-01 00:02:40.516000+00:00,True,08-07-'19 - Indefinite period,...,22 years - 40 years,Not important,1 person,Dutch English,"Student, Working student, Working",https://resources.kamernet.nl/image/d0780298-b...,,425,425.0,


## Data analysis and machine learning

Now, when data is loaded, we can preform data analysis and train machine learning models. This is a job for you!

## Making a submission file

Once the machine learning model is trained, we can feed it the testing data which does not contain goal states. To evaluate our model, we need to create a submission file and upload it to the Kaggle. The submission file is a csv file which consists of the `id`s of the datapoints and their predicted goal value. An example of the submission file for the competition's dataset is given below. 

In [4]:
# For demonstration purposes, we will load the submission file from an existing csv
submission = pd.read_csv('datasets/sample_submission.csv', index_col='id')

submission.head()

Unnamed: 0_level_0,rent
id,Unnamed: 1_level_1
1,0
2,0
4,0
5,0
6,0


In [5]:
# Normally, a submission dataframe would be generated when we feed the test data to the model
# But we will fill it with random numbers

for i in submission.index:
    submission['rent'][i] = np.random.randint(
        train['rent'].min(), 
        train['rent'].max()
    )
    
submission.head()

Unnamed: 0_level_0,rent
id,Unnamed: 1_level_1
1,1391
2,2419
4,5381
5,5972
6,5224


Once the submission file is obtained, we can convert it into a csv file using the `to_csv()` method which takes the desired file name as an argument

In [6]:
submission.to_csv('submission.csv', index=True)

Now that we have the submission file in csv format, we can upload it to Kaggle for evaluation.