Chapter 2 Proposal

2.1 Research topic

Our research project focuses on the rental data in NYC covering more than a decade till 2022. Rental is always a heated topic in NYC, which is a metropolis with the largest city population exceeding 8 million. According to $New \space York \space Post$, in June 2022 the average rental price in Manhattan incredibly was $\$5,058$, unprecedented in NYC history. (https://nypost.com/article/where-nyc-real-estate-rental-market-stands-right-now-housing-prices/) We plan to study how the rental prices has changed in recent years, with variation in locations (boroughs, neighborhoods, etc.) to see the chronological and geographical trends and on the prices. Moreover, we plan to do research on the potential relationship between the prices and other factors, such as rental inventories, crime rate and occupancy/vacancy rate by different visualization tools in R together with assistance of other statistical tests. Overall we try to analyze which factor(s) have more or less influence on rental prices and if certain boroughs, neighborhoods put heavier pressure on people who try to live there.

2.2 Data availability

2.2.1 DOF Condominium Comparable Rental Income in NYC

url: https://data.cityofnewyork.us/City-Government/DOF-Condominium-Comparable-Rental-Income-in-NYC/9ck6-2jew This dataset contains the basic information of all apartments in New York, including geographic location information (coded by borough ID and priority ID) and house properties. Collected by The Department of Finance (DOF) through the investigation of apartment information, it is highly reliable and basically error free; Because it includes detailed address and house information, this dataset is very convenient for us to analyze the geographical distribution of rental prices; At the same time, according to the detailed geocoding, we can jointly analyze this dataset with other data to draw further conclusions. The data is stored in csv format, and we can use read.csv() to import directly. According to the inspection, there are no missing items in the data column we use.

2.2.2 NYC crime

url: https://data.cityofnewyork.us/Public-Safety/NYC-crime/qb7u-rbmr This dataset contains the overall crime information in New York this year. Collected by New York City Police Department (NYPD), it is also highly reliable(After all, it’s first-hand information); As it contains the position and other detailed information about the crimes, we can analyze the relationship between the crime rate and the rental fee in NYC. The data is stored in csv format, and we can use read.csv() to import directly. According to the inspection, there are no missing items in the data column we use.

2.2.3 School Locations

url: https://data.cityofnewyork.us/Education/2017-2018-School-Locations/p6h4-mpyy This dataset contains the overall school information in New York this year. Collected by Department of Education (DOE), it is also highly reliable; As it contains the location and other detailed information(such as level, and average grades) about the schools, we can analyze the relationship between the school locations and the rental fee in NYC. The data is stored in xlsx format, and we can use read_excel() to import directly. According to the inspection, there are no missing items in the data column we use.

2.2.4 Median gross rent

url:https://data.census.gov/cedsci/all?t=Housing&g=0400000US36%248600000 This dataset contains different housing information in New York State provided by the United States Census Bureau(USCB). USCB is a principal agency of the U.S. Federal Statistical System and is in charge of gathering information on the population and economy of the country. We filter the tables by geography to all 5-digit ZIP code tabulation area in New York State and find the topics we are interested in(e.g. median gross rent). We will process the table by choosing the ZIP code in New York City only to concentrate on our topic. Since a fraction of areas has missing data, we will use the available methods to fill. The dataset can be downloaded as csv file which can be easily imported by R.