top of page

Assessing Street Vegetation Rate in Manhattan, New York City

GSAPP, Columbia University     Instructor: Boyeong Hong

Collaborator: Yingjie Liu, Rae Lei, Jiayi Zhao, Shuhua Li     March 2022 - May 2022

Key Words: Street view imagery, Efficient neural network, Green space, Machine learning, Socioeconomic status.

Abstract

This project examines the street vegetation rate in Manhattan borough of New York City by extracting static street view images from Google Street View API, conducting image processing based on an efficient neural network model and comparing socioeconomic status of each neighborhood with vegetation rate at census tract.

Introduction

Covid-19 has brought many challenges to New York City. During this turbulent time, compared with staying indoors, people tend to spend more time in outdoor green spaces while keeping a safe social distance. More than ever, New Yorkers rely on parks and other outdoor green spaces, like plazas or natural landscapes, to support their physical and mental health. And among all the green spaces, we found street vegetation the most crucial factor that affected the public space experience of everyone involved and the dynamic of streets.

Another benefit of street green spaces is that it is  the most extensive, interconnected network of public spaces in our cities. The fabric of street networks  give them great potential to become a citywide, resilient ecosystem contributing to personal wellness and the health of the natural environment. According to this streetscape design project by Gensler(Theeuwes, 2021), this expansive network street can provide protection against the effects of climate change. In the borough of Manhattan, streetscape space comprises about 30 to 35% of a city’s overall land area, more than any other kinds of public spaces. 

Thus, to explore the current street vegetation condition, this project will explore the street greenery rate in New York City by analyzing the street view photos and neighborhood features. Through various machine learning methods, we aim to find the answers to the questions listed below:

1.What character of a neighborhood contributes most to a higher greenery rate? For instance, English proficiency, population count of specific age groups or median household income.

2.Is there an obvious greenery rate gap among all the neighborhoods in New York City?

3.What kinds of urban design strategies or policies can be applied according to the analysis result?

2.Literature review

2.1 Using machine learning to examine street green space types at a high spatial resolution: Application in Los Angeles County on socioeconomic disparities in exposure

1.jpg

Fig 1.Image Segmentation

Source:(Sun et al., 2021)

Compared with using satellite images, “Using machine learning to examine street green space types at a high spatial resolution: Application in Los Angeles County on socioeconomic disparities in exposure” uses street-view images to calculate the green ratio in an area, which could capture different types of green spaces, like vegetation or terrain. In this way, the green ratio is more related to people’s real perception. And in this paper, green spaces are divided into three different types, respectively, tree, low-lying vegetation and grass. This paper has a clear structure from obtaining street-view images and training the model to the final GLMMs examination. Firstly, this team obtained socioeconomic factors online. Then, download dataset images for training models and street view images in Los Angeles. Thirdly, using semantic segmentation to measure total and types of green space. Fourthly, using Intersection over union to evaluate the performance. Fifthly, using Normalized difference vegetation index to compare the green space with satellite imagery-based green space. Finally, using generalized linear mixed models to examine the association between SES factors and street green space level. Results show that the deep learning model has a high accuracy, with 92.5% mean intersection over union. Also, three kinds of green space have negative associations with neighborhood SES. In conclusion, both the workflow of this paper and the machine learning models it used are of great reference value.

2.2 Machine learning on high performance computing for urban greenspace change detection: satellite image data fusion approach

2.jpg

Fig 2.Steps in green space assessment

Source:(More, 2020)

Fast-changing urban regions require continuous and fast green space change detection. So This study uses a fusion approach to detect the urban greenspace change in Mumbai. This study involves satellite image classification using SVM and spatio-spectral fusion of satellite image data. It uses 4 steps, which is Pre-processing, general information and mathematics, support vector machine and spectral fusion approach for classification.

As shown in Figure2. Classification is performed on the fused data using the SVM to monitor green space changes over a period of 15 years. Results show the decreasing tendency in Mumbai, and this research concludes that good performance is achieved using machine learning for green space analysis with a spatio-spectral fusion approach. 

So from this study, we can learn that we can also use machine learning with satellite image data fusion approach to detect greenspace in New York City.

3. Data and methods

3.jpg

Fig 3.Methodology Framwork

3.1 Study Population

This project will cover all one of the five boroughs in New York City, Manhattan. The city is an ideal location for the research on greenery rate analysis and social injustice issues related to it, because it is one of the most populous and diverse cities in the world (>8.4 million people by 2019)(United States Census Bureau, 2018). And during pandemics, marginalized low-income groups tend to suffer more from both mental and physical health problems(NYC.gov, 2022).

3.2 Street View Images

We used Google Street View Static API to request the street view images as the test set. Google Street View Static API provides 360 panorama images based on location information across large regions of the United States. In the process of single image request and response, the view point is defined with URL parameters including location, fov, heading, pitch and radius and the street view picture is sent as a response through a standard Http request.(Google, n.d.)

The road network information was obtained through Open Street Maps dataset and includes all information under the “street” category. The test viewpoints were evenly selected from streetnet through 3D visual programming platform Rhino and its plugin, Grasshopper. The plugin Urbano was used to import GIS shapefile information to the Grasshopper environment. And the plugin Kangaroo was used to decrease overlapping points. We obtained street view points and its GIS location information was used to request for test street view images. In the practice, the images with heading of 0,90,180,270 and fov of 90 was selected.

3.3 social and neighborhood conditions current

To explore the current socio economic condition of neighborhoods, we downloaded a series of csv files with geoid from the US census database(U.S. Census Bureau, n.d.), including occupied housing units, median household income, monthly housing cost, total population, population of different age groups and enthnic groups, disability rate and education level(U.S. Census Bureau, n.d.).

3.4 machine learning model

We adopted a pre-trained model named efficient neural network. Its architecture was raised in 2016 by researchers in Mathematics, Informatics and Mechanics University of Warsaw and has the ability to perform pixel-wise semantic segmentation in real-time, up to 18× faster, requires 75× less FLOPs, has 79× less parameters, and provides similar or better accuracy to existing models without relying on a supercomputer (Adam Paszke,2016).

The ENet has the ability to distinguish the list of classes (Automatic Addison,2021):

Unlabeled, Road, Sidewalk, Building, Wall, Fence, Pole, TrafficLight, TrafficSign, Vegetation, Terrain, Sky, Person, Rider, Car, Truck, Bus, Train, Motorcycle, Bicycle.

Considering the amount of test data and hardware we have, we finally choose this algorithm.

3.5 Methodological framework

As shown in Figure 3, we developed our framework according to the literature review indicated in figure 3. The whole structure is divided into 2 parts. In the machine learning part, 2000 points were selected  from the street network in Manhattan(NYC Open Data, n.d.), and their location data was extracted in GIS as an URL parameters for google street view api to get street view images. Each viewpoint requested 4 pictures in four different directions. In the meantime, we used the Efficient Neural Network model to identify 2 kinds of street vegetation through pixel based semantic segmentation, bushes and trees. Later we imported all the images we had into the pre-trained model and exported the ratio of selected classes pixel by total pixel amount to calculate the greenery rate of each viewpoint.

And this leads to the analysis part, which is comparing greenery rate with multiple neighborhood features. The exported greenrt rate data was grouped and averaged based on census tract boundaries. The census average greenery rates were then merged with neighborhood features based on GEOID. We set LiDAR data for the comparison group to ensure the accuracy of the data. The result went well. Then through standardizing the data, filtering the no meaning data, fitting three different regression models and comparing the model performance, we managed to find out the correlations and significance level of each feature.

3.5.1 Image processing

To elaborate the analysis process in more detail, in order to obtain test grid matrix,  first we imported the shapefile of street network, extracted all street vertices as shown in figure 4, which are more than 10,000 points, which might take us a long time to calculate the greenery rate of 40,000 images, so later we used grasshopper to filter the data, deleted duplicated points and points that are too close to each other, which cut down the number of points to 2000, then we export all points with location data as stated in table 1.

4.jpg

Table 1. Coordination of points

After we have all the points, we imported the lat and lon of each point into google street view api to get the images we need for further analysis as shown in figure 5.

5.jpg
6.jpg

Fig 4. Street network and vertices

7.jpg
8.jpg

Fig 5. Static street view images

Later, we used semantic segmentation to extract greenery rate. Firstly, we imported the street view images and created a blob, which is a group of pixels that have similar intensity values. Then, we loaded the pre-trained neural network, set the blob as its input, and then extracted the predicted probabilities for each of the classes, like vegetation, terrain, sidewalk, person, etc. Also, we created a class legend that is color coded as shown in figure 6. For this project, we only need to focus on the vegetation and terrain area. So we exported the vegetation rate and terrain rate of each image for the next step.

图片10.jpg

Fig 6. Image segmentation and legend

After getting the vegetation and terrain rate, which means the greenery rate of bush and trees, we imported the csv file as a point layer with metadata in QGIS and used points in polygon to count the number of points that fall into each census tract block and sum of greenery rate behind each point. Thus, we got the average greenery rate of each block, and matched them with 21 neighborhood features, as indicated in figure 7 and table 2.

图片11.jpg
图片12.jpg

Fig 7. Terrain and vegetation rate of each census tract block

图片13.jpg

Table 2. Vegetation rate with GEOID

3.5.2 LiDAR comparison

To confirm the accuracy of our results, we generated a map using the 2017 Light Detection and Ranging (LiDAR) data for comparison. This raster dataset is a 6-in resolution Land Cover and was developed as part of an updated urban tree canopy assessment to represent a ''top-down" mapping perspective (NYCDOITT,2017). The data was imported into ArcGIS and splitted by the shapefile of census tracts to calculate the green ratios based on locations.

图片14.jpg

 Fig 8. LiDAR Land Cover Map

图片15.jpg

Fig 9. LiDAR Land Cover Map splitted by census tracts

Two classes of data: Tree Canopy and Grass\Shrubs, were chosen out of the 8 classes for data processing. After dividing the add up of the count numbers of Tree Canopy and Grass/Shrubs by the total count numbers of all classes, we generated a csv. File collecting the green ratio of each census tract block based on LiDAR Land Cover Dataset.

图片16.jpg
图片17.jpg

Fig 10. Classes of LiDAR Land Cover dataset 

Fig 11. Green ratios with GEOID based on LiDAR Land Cover Map

Then, we created a choropleth map in ArcGIS to show the green ratio based on census tract blocks. Comparing this mapping with the terrain and vegetation mapping using the street view model, the results reflect the high accuracy of the street view analysis model because of the similar distribution of green ratios.

图片18.jpg

Fig 12. Green ratios of each census tract block based on  LiDAR Land Cover Dataset

图片19.jpg
图片20.jpg

Fig 13. Terrain and vegetation rate of each census tract block based on street view model

4. Results

By comparing our street view green rate map with LiDAR green rate map, we found that the vegetation rate map of street view is more similar to LiDAR green rate map. So we can use these two maps to analyze the green ratio of Manhattan because of the high accuracy.

4.1 Coefficients between greenery rate and neighborhood features

Moving forward, we used 4 different models to test the coefficients between greenery rate and neighborhood features. We first used seaborn heatmap to show the correlations between different features as our preliminary analysis, to avoid multicollinear issues.

And we found there's a high correlation between input dependent data. The correlation between monthly housing costs and median household income, total population and occupied housing units are higher than 0.8. We deleted one of the pairs of these data and got the correlation matrix changes from left to the right as shown in figure 14.

图片23.jpg

Fig 14. correlation matrix

We tried three different regression models, OLS, Ridge and Lasso. The OLS performed better than the other two. However, all the three models can only explain nearly 40% of the data. This is why we turned to the Decision tree to calculate the importance of each feature. Feature importance to greenery rate is shown as below in figure 15.

图片24.jpg

Fig 15. Feature importance

4.2 Result and implications

4.2.1.Distribution inequality

So from the vegetation green rate map of street view, we can conclude that the blocks around the central park have higher green rate, uptown green rate is higher than downtown. Midtown shows the lowest green rate of manhattan. The green rate along the riverside shows higher than the blocks inside manhattan island.

This result shows that the street vegetation distribution of Manhattan is unequal and needs to be improved in the future.

4.2.2.Housing cost has high correlation

Combining the correlation analysis and Decision tree regressor result, we can see the most important feature is ‘Monthly housing cost median’ of NYC, which is negatively correlated.This means higher housing cost blocks have lower green ratio, which indicates that the developer of higher cost housing in Manhattan didn’t pay attention to the green ratio of the housing. And this may cause the unhealthy status of residents living there in some cases.

4.2.3.Disadvantaged populations has higher correlation

‘Population 60 years and over’ and ‘females enrolled in school’ have positive correlation with green ratio, and these two features also have more importance from the analysis of decision tree regressor. This shows that higher green rate blocks have more population over 60 years and females enrolled in school.

In addition, ‘Percentage Population 16 years and over unemployed’ is negatively correlated with the green ratio. So urban planners and designers need to pay more attention to this kind of population when they are planning.

4.2.4 Implications

From the results, we can conclude that Greenery should be equally distributed in neighborhoods. Urban operators and planners need to specially consider disadvantaged populations, such as elderly people and teenagers, who need more green space according to the result analysis. Urban design should accommodate high density with greenery for human health.

And we can Identify which neighborhoods need improvement on greenery through urban renewal projects according to the green ratio map.

4. Results

5.1 Project limitations

Regarding the accuracy of the model, we used LiDAR aerial raster data to compare the result through mapping the data visually. However, there are certain levels of limitations and inaccuracy of the image semantic segmentation. The dataset does not completely reflect  human perception in reality when people physically experience the neighborhoods.

In addition, the study area of this model only covers Manhattan district because of the large dataset and computing limitations. The result of this model can only explain the situations within the Manhattan area, and may not have enough data to well-explain the situation in other places.

Parks usually have a higher green ratio, which will affect our neighborhoods' correlation analysis.

5.2 Opportunities and Improvements for future analysis

For further analysis, this model can be used to analyze the relation of green ratio to social equality in different cities and countries for comparison in order to examine and suggest future urban planning and design strategies. This study model proces can also include  potential physical and mental health data for future analysis. For example, the land surface temperature map in New York is useful to study heat island effects, and heat vulnerability in different neighborhoods in relation to green ratio (USGS,2019).

In addition, the mental health services map released by the Mayor’s office of Mental Health and Community can analyze the impacts of green spaces on human mental health. (NYC Open Data, 2022) Including more diverse datasets into the model will help to examine and justify the correlation between green space, social equity, and heath, thereby producing data-based guidelines and strategies for future urban planning and design.

References:

 

Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2016cityscapes.pdf

Department of City Planning. (2020). Census - Download and Metadata. Www1.Nyc.gov. https://www1.nyc.gov/site/planning/data-maps/open-data/census-download-metadata.page

Department of Information Technology & Telecommunications. (2018, December 12). Land Cover Raster Data (2017) – 6in Resolution | NYC Open Data. Data.cityofnewyork.us. https://data.cityofnewyork.us/Environment/Land-Cover-Raster-Data-2017-6in-Resolution/he6d-2qns

Google. (n.d.). Street View Static API overview. Google Developers. https://developers.google.com/maps/documentation/streetview/overview

More, N., Nikam, V. B., & Banerjee, B. (2020). Machine learning on high performance computing for urban greenspace change detection: satellite image data fusion approach. International Journal of Image and Data Fusion, 11(3), 218–232. https://doi.org/10.1080/19479832.2020.1749142

NYC Open Data. (n.d.). NYC Street Centerline (CSCL). NYC Open Data. https://data.cityofnewyork.us/City-Government/NYC-Street-Centerline-CSCL-/exjm-f27b

NYC Open Data. (2022). Mental health of NYC. Cityofnewyork.us. https://mentalhealth.cityofnewyork.us/wp-content/uploads/2021/05/CMH-MapforWebsite-scaled.jpg

NYC Open data. (2017). NYC LiDAR. Maps.nyc.gov. https://maps.nyc.gov/lidar/2017/

Ouyang, C. (Elvin). (2020, May 31). How to Query Google Street View Static API with Python (UPDATED IN 2020). Elvin Ouyang’s Blog. https://elvinouyang.github.io/project/how-to-query-google-street-view-api-with-python/

Sears-Collins, A. (2021, February 27). How To Detect Objects Using Semantic Segmentation – Automatic Addison. https://automaticaddison.com/how-to-detect-objects-using-semantic-segmentation/

Sun, Y., Wang, X., Zhu, J., Chen, L., Jia, Y., Lawrence, J. M., Jiang, L., Xie, X., & Wu, J. (2021). Using machine learning to examine street green space types at a high spatial resolution: Application in Los Angeles County on socioeconomic disparities in exposure. Science of the Total Environment, 787, 147653. https://doi.org/10.1016/j.scitotenv.2021.147653

Theeuwes, J. T. (2021, April 9). Creating Resilient Urbanism With Streetscape Design. Gensler. https://www.gensler.com/blog/creating-resilient-urbanism-with-streetscape-design
 

‌U.S. Census Bureau. (n.d.). Explore Census Data. Data.census.gov. Retrieved April 26, 2022, from https://data.census.gov/cedsci/table?g=0500000US36061%241400000&d=ACS%205-Year%20Estimates%20Detailed%20Tables

USGS. (2019). Urban Heat New York City | U.S. Geological Survey. Www.usgs.gov. https://www.usgs.gov/media/images/urban-heat-new-york-city

bottom of page