NYC’s top data analytics guru Dr. Amen Ra Mashariki tells us how collecting the right info can prevent fires and save lives.
Dr. Amen Ra Mashariki. Image: Brooklyn Tech/Flickr
Sitting in traffic on the way to the Brooklyn Navy Yard on Thursday, I wondered to myself why I hadn't simply taken my bike instead. I was en route to interview Dr. Amen Ra Mashariki, the City of New York's chief analytics officer, determined to better understand how data analysis can help making living in this cramped city a little more pleasant for its nearly 8.5 million residents. After all, the purpose of the event where I caught up with Mashariki, Smart Cities NYC 2017, was to explore the intersection of "technology and urban life." Who better to ask how the reams of data the city collects can actually make a difference?
What follows is an edited and condensed recreation of my conversation with Mashariki, which took place on Thursday afternoon.
Motherboard: When I saw the title "chief analytics officer," I had no idea what that was in the context of a city employee. Can you explain what a chief analytics officer does and how you got here?
What was jumping into government like?
For me it was a learning experience, because I came in thinking the only people who go into government don't want to do work, or they're lazy or not smart. I was absolutely—my whole concept turned on its head! These are hard workers, committed people, extremely smart, who are experienced and knowledgeable in their fields, who have decided, "Hey I want to be maximally impactful." People in the private sector are the same way but in government you have to have a sense of patience.
It's more of a mentality like, "We're not planning for the next quarterly results, we're planning how to better society."
That's right! And that's a great segue into my role. My office, the way I describe it is, we're a no-cost data analytics consulting firm to the city. There are things that are long plays but there are things that city agencies need to do now. We need to find the top X-number of buildings that are not up to code. We need to find the places where the city is not performing. We need to identity a more efficient route when we do an emergency response or plow snow. So there are things that need to happen virtually immediately, right? When an agency decides that they're looking for a more innovative way to be more efficient and to be more impactful and to drive costs down, that's where we come in. We help city agencies determine what data sets they can use to help solve any number of problems that they may have.
What sort of questions do these agencies ask you when they want help with, say, snow removal?
The way that we function is that all a city agency has to do is essentially talk about their business challenge. Let's take one problem—one of the problems that we did a while back was to help fire inspectors buildings that were illegally converted [such as an apartment building only being licensed to 10 units actually having 14 units instead]. Why do we want to know which buildings have been illegally converted? Because the business problem was, "We want to minimize the number of fires that exist in the city." We translated that into an analytics question that says, "If we identify buildings that have been illegally converted you're taking gas and electricity to those additional units that you're not permitted to. So you have to do all sorts of splicing and connections and so forth, so maybe you've got your cousin or your uncle doing the work and they're not qualified to do it properly, which grows the chance of fire. So if we can minimize the number of illegally converted buildings we can minimize the number of fires.
The thing that we think about the most is emergency responses. When you look at the recent history of New York City, there's 9/11, Hurricane Sandy—the de Blasio administration is very concerned about making sure we have the right resources and infrastructure in place for emergency response. So my office thinks about emergency response from standpoint of sharing data. Every month we do a thing called a data drill. Essentially we work with city agencies to create an emergency scenario and then we present that scenario. "This happened on this day, here are the circumstances, and here are the things that the city leadership need to know to be able to respond." We think about these agencies, whether it's the NYPD or FDNY, but then there's also questions about data that needs to get passed so that the leadership of these agencies have a better level of insight so that they can move the pieces where they need in an emergency situation.
What kind of pieces?
If there are downed trees, for example. Let's say there are X-number of downed trees in the city. How do we know which ones to go and remove first? Should we go and remove the downed tree because someone called it in first so it's at the top of the queue but it's in the middle of nowhere so it's not impacting traffic? Or maybe it's the people who called in 20th on the list but the tree is right in front of a building where people with disabilities live? Well, you should probably go there first.
Mashariki: So there's all these things around data and information that agencies need to know. During a drill, we practice what we call "data at the speed of thought." We want people who are responding to emergencies to have data at the speed of thought.
Were those conversations happening 10 years ago, 20 years ago, where the NYPD was talking to the Department of Buildings in terms of sharing data?
Not at the level that we're facilitating now, no. I always bristle when I hear people say, "Aren't there silos where this agency isn't talking to that agency?" You have people who have been in government for 20, 30 years who've worked at four different agencies. And because they've worked at four different agencies they know the people to call at those agencies when they need stuff. So there is a mechanism that exists, "Oh I know that guy at the agency, let me call him," but there wasn't a framework or an infrastructure across the board. There's always one to one sharing where this agency can share with that agency, but to create an environment where all agencies can share across everyone else, no that wasn't happening.
Now we're talking about data-sharing but let me also add another piece in there. Let's say 10 years ago everyone was sharing data but there was something that no one was asking and that's whether or not the data was even good. That's my job to ask. Just because Agency A asks Agency Y for the data and they emailed it to them in an Excel spreadsheet does it mean that that data's going to be useful? It just means that they responded. Awesome, thank you, check, you responded, but is the data useful? Is it even in a format that I can understand?
So who's responsible for ensuring that the data that's being collected is quality data to begin with?
Mashariki: The agencies. The agencies do a good job but then, say, the Department of Buildings is a huge agency, so the question really should be who in that agency is working to ensure that that particular data is accurate. There isn't one "data guru" who oversees all the different sets of data. So in an emergency what happens is someone from City Hall calls the head of that agency. Someone from City Hall doesn't know the middle manager who oversees any particular dataset, but the head of that agency will, or can call on the Chief Information Officer to find that person. At that point you've already reached the threshold of how many people you should contact to get something out. So the overall question for us is, in an emergency situation what's the process for getting the right and best data? My sense is if you gave an agency a couple of days to find the best, highest quality data on any given topic they can do that, but if you give them 30 seconds are you going to get to that?
So it sounds like we're limited to moving at the speed of people? It's not so much, data flying everywhere, ahh! It's more, OK, who knows this and how quickly can we find this person?
That's exactly right. We lead with people and not data. It's all about whom you engage, how you engage them, and what tools you give them with which to respond. It's something I learned during my White House fellowship: government is well-equipped with the right people but we have to build out processes to give them the tools to be able to respond successfully.
And that doesn't seem all that different from any other large organization. You often have the right people but they don't have the tools, or even know where to get the tools, to be as efficient and productive as they fully have the ability to be.
So let's step back a bit. What kind of data does the city actually collect? I assume there's the usual things like crime statistics, car accidents, but what else?
That's a big question so here's how I'll answer it. If you go to our Open Data Portal we have over 1,700 datasets and the next highest city has like 800, maybe 900. And we've only just begun. As for specifics: location, location, location. Almost every single dataset that we have in the city revolves around location. Some of our bigger data agencies are the Department of Finance because they manage information around taxes and land use. The Department of Buildings has building information.
Every building in New York City has a Building Identification Number, and that BIN corresponds to a host of characteristics. Like you said, the NYPD collects crime statistics so that's a big dataset. The Department of Sanitation collects data about snow plow routing. The Department of Homeless Services sends us nightly the aggregate number of people in homeless shelters. What we don't get for all sorts of privacy reasons is data from the Department of Health, at least that we release to the Open Data Portal. But I would say buildings data is probably your most expansive dataset but that's primarily because buildings cut across all sorts of different agencies: Department of Finance for tax data, obviously the Department of Buildings, you've got Housing Preservation and Development, you've got NYPD that stores data.
What sort of data is there surrounding transportation? I know everyone hates the subway but de Blasio has taken great pains to remind people that the city doesn't control the subway [a state agency does]. I'm just wondering what role does data play in getting folks from Point A to Point B as quickly and as safely as possible?
We have CitiBike locations. We get TLC [Taxi and Limousine Commission] locations, that's a big dataset. We've got Department of Transportation data in terms of crashes and fatalities. We also have a partnership with Waze where we get some data from them that can be useful. DOT also has cameras where they can track traffic.
It's funny you say Waze because I'm wondering if there are any other startups or private companies either working with the city to improve the data or improve how it's accessed?
My office did a project called Business Atlas where we took federal, state, and city data, things like liquor license data and a bunch of Department of Consumer Affairs Data, we took all this data created a view of businesses such that you could get a better sense of market research if you were trying to open up a business in that area. You'd get a sense of median income, median age, what businesses that existed in that area, and so on and sort forth. We worked with a startup called PlaceMeter that uses sensors to measure foot traffic information around that area. So we created this map where you could put in an address an immediately get a sense of what's the business environment is like in that area. We also have strong partnerships with academia.
So the theme of this event is the future. Things are whatever they are today but how can we make things better tomorrow. When you leave your role as chief analytics officer what would you like your legacy to be
That's an easy answer. Two things. One is, I've already said that my job is to grow the competency of the Mayor's Office of Data Analytics [MODA] in the short term such that we can lessen its role across the city in the long term. And the concept behind that is, creating a culture will be more sustainable than saying, "As long as there's a MODA these things will get done." Because one thing we know about government is, there may not be a MODA depending on who gets elected years down the line. So my job is not to ensure that MODA maintains this leadership role and is always this hub of analytics excellent. My job is to spread analytics excellence across the city within these agencies.
So it almost sounds like a successful MODA under your leadership is one that doesn't even really need to exist.
That's exactly right. And the second is, oversight of our open data strategy. When I came in we published this vision called Open Data for All. And Open Data for All says that, yes, the city has this data that we want to release to New Yorkers but it's not useful if only a small, chosen few statistically minded few can use it. We need to have it so that all New Yorkers from all walks of life know that this exists and know how this exists but not only that but that use it to their advantage. We want it so that all New Yorkers can say, "Having access to this data can do things like help me start a business or can help me figure out a smart way to engage with my community." A strong implementation of Open Data for All and a growing understanding that everyone in New York—everyone—should have access to the resources of the city.
Subscribe to Science Solved It , Motherboard's new show about the greatest mysteries that were solved by science.