Multiple Datasets Come Together to Shed Light on Urban Problems
Last updated
Last updated
This is the story of two different datasets. The first set comprises information about who owns what, and where property data today serves as the digital backbone to many municipal operations. In New York City, the Information Technology Division at the Department of City Planning compiles information on tax lots from several City agencies in the Primary Land Use Tax Lot Output dataset, called “PLUTO."
The second set comprises 311 data. When a New Yorker dials 311, they want an answer to a question or a solution to a problem; the service is powered by digital infrastructure, artificial intelligence applications, and reams of data. Most 311 inquiries are informational, but one in ten calls, texts, and application-based messages to 311 trigger a request to City agencies for services ranging from restoring heat and hot water to filling a pothole. Each record is tracked digitally, coming together to form the roughly three million annual service requests in the 311 Service Request dataset.
So what do 311 and PLUTO have in common? While each open dataset is collected in different ways and serves different purposes, together they can be analyzed to tell the story of who, what, where, when, and why of urban issues.
Data scientists at MODA have seen this firsthand. In one project, MODA built a statistical model to help the Department of Buildings and the Fire Department prioritize how to allocate 200 inspectors to investigate complaints of illegally converted dwellings. Using PLUTO and 311 data alongside fire incident data, MODA found that homes with histories of tax delinquency, mortgage liens, and buildings violations reported through 311 are the most likely to contain illegal conversions. Based on this discovery, MODA developed a model to send inspectors to the riskiest places first.
In another project with the NYC Tenant Harassment Prevention Task Force, MODA helped focus enforcement resources by testing whether potential signs of tenant harassment—including 311 complaints about dust from illegal construction and dirty conditions such as mold or standing water—could predict a building’s risk of losing rent stabilized units. Properties that displayed those conditions were nominated to inspectors for a closer look. Like the open datasets that power them, the methodology and results of these analyses are made public on MODA’s open source analytics library and GitHub account.
Analyzing incident-level data (like 311 service requests) alongside locational data (such as PLUTO data on a property’s physical conditions, or Census data on neighborhood characteristics) starts to piece together a more holistic picture of city problems than each data source on its own.
Agency
Dataset
A row is
Link
DCP
Property Land Use Tax Lot Output
A tax lot
311
311 Service Requests
A 311 complaint routed to an agency