Monday, November 28, 2011

RIOC Operations Committee Meeting Today After 4 Months To Discuss Ongoing Construction and Infrastructure Projects - How About Publishing Roosevelt Island Data On Web For New Apps?

The Roosevelt Island Operating Corp (RIOC) Operations Committee will be meeting later this afternoon for the first time since July 29, 2011. According to RIOC:

PLEASE TAKE NOTICE that a meeting of the Operations Advisory Committee of the RIOC Board of Directors will be held on Monday, November 28, 2011 at 5:30 p.m. at the RIOC administrative office, 591 Main Street, Roosevelt Island, New York.

The Committee will meet to discuss ongoing construction and infrastructure projects and planning.
One area of Roosevelt Island Operations that I would like to see RIOC take advantage of is opening up and sharing the data it collects to the public. Here's a very interesting article on cities and municipalities sharing their data with the public on the web. An excerpt:

... Across the country, geeks are using mountains of data that city officials are dumping on the Web to create everything from smartphone tree identifiers and street sweeper alarms to neighborhood crime notifiers and apps that sound the alarm when customers enter a restaurant that got low marks on a recent inspection.

The emergence of city apps comes as a result of the rise of the open data movement in U.S. cities, or what advocates like to call Government 2.0....

...  In New York City, Mayor Michael Bloomberg is pushing open data as a key component of his effort to transform the city into Silicon Valley's chief rival.

The city is in its third year of sponsoring the NYC BigApps competition, in which programmers compete to build useful new apps based on city data.

At a recent weekend "hackathon," first prize went to "Can I Park Here?," an app that matches location information from a user's smartphone to a database of city parking regulations to cut through the clutter of confusing parking signs with the message "Don't park here!"...

Roosevelt Island information such as Red Bus, Tram and subway ridership, street and Motorgate parking and event scheduling are examples of data that RIOC collects that can be published on the web. We have some very smart techies on Roosevelt Island. You never know what they might come up with that would be beneficial to the community.

Here's a video from Gov Fresh showing NYC Chief Digital Officer Rachel Sterne demonstrating how open digital info is changing the delivery of services to New Yorkers.

You Tube Video of Open Data Government Innovation

RIOC already uses the RI 311 See Click Fix reporting system,  Next Bus Red Bus GPS Tracking info and has met with the team behind the Roadify transportation app. All of these innovative digital initiatives were started by former RIOC Director Jonathan Kalkin.

Kudos to RIOC for recently winning a NY State award for innovative use of digital technology. That's a great start. We can do more.

An example of what a Roosevelt Island resident can produce is this Android Cell Phone Red Bus Tracking App from Vini Fortuna.


residential said...

"benefit to the community"? What? You want RIOC to do what?

Frank Farance said...

[Preface: this is a data-centric technical comment]

While I'm in favor of making information (and its data) available, I'll pick up on the comment from "residential", who said: You want RIOC to do what?  For a moment, pretend you are Mr. Moreo, RIOC's Director of Information Technology, who might ask: "By agreeing to make the data available, what am I committing RIOC to do?"  This is not a simple question. Note 1: Mr. Moreo has not offered opinions on this, I'm just suggesting hypothetical conversations that allow us to see it from RIOC's perspective.  Note 2: I work in a committee to help Federal agencies share their datasets with the public for just these kinds of purposes.

Let's say Mr. Moreo provides a data feed that provides GPS-like data at one-second intervals for his buses and someone builds a public software application based upon that data, e.g., making better predictions than NextBus (NextBus has its shortcomings). In fact, this public software application is so cool that it can guestimate how full the bus is (and if you can get a seat) by tracking acceleration and velocity.

Over the next several months Mr. Moreo discovers that he can use this data to coach some of the bus drivers to accelerate/ decelerate slower, and to drive at the right speeds at certain points on their route.  But to accomplish this, he needs higher quality data, which means fewer lost data points.  So he works with the vendors and gets a reliable data stream, which is now 5-minute delayed (longer latency), but has no missing data points (higher quality).  Mr. Moreo is happy and, with these data modifications, his analysis software helps improve RIOC's operations.

Unfortunately, the bus prediction capabilities of the public software application are way off (people miss the buses by 5 minutes) and using earlier data makes the predictions unreliable.  The software developer might complain to Mr. Moreo "Hey, you changed the data", so what should Mr. Moreo do?

This is a tough problem.  Let's say that Mr. Moreo decides to provide a second data feed (lower latency, but less quality), which is somewhat like the original feed.

Then, one of the GPS units needs to be replaced and the data format of the positioning data changes.  While this change might have no effect on RIOC's operations, it might make the data incompatible for the public software application.  Should Mr. Moreo provide translation to the original data format?  And then several other GPS units are replaced and the new GPS format turns out to be more useful for other kinds of public software applications.  Should Mr. Moreo provide a third data feed?

I ask these questions because we might be going down the slippery slope of RIOC committing to provide additional services.  Assuming RIOC does provide a data *service*, then it won't be access to the raw data.

This problem is a common one for government entities: the desire to provide more information (and its data) to constituents.  The Freedom of Information processes can provide some guidance on striking a balance: it is reasonable to provide copies of records (e.g., access to data, access to financial records, access to documents), but it is unreasonable to ask the agency to provide additional services above "copying" (e.g., data conversion, breakdowns of a set of financial transactions, summaries of documents).

In other words, it might be possible to have access to the raw data (not a RIOC formalized data *service*), but it might not be useful for all the applications we'd want to build; and the data might change over time, which might affect its usefulness to us.

Not as simple as thought at first glance.  Unclear what the Right outcome is.

Jesse Webster said...

Look to San Francisco's transportation agencies, BART and the SFMTA, for a good example of how an agency can begin offering these types of datasets to developers while avoiding what you describe, which is essentially "scope creep". 

In the San Francisco agencies' process, developers register for access to data feeds. There are disclaimers associated with access to this data, and translation of future changes is the responsibility of developers, as it should be.

The example you gave is willfully convoluted, not to mention unlikely. 

Even if RIOC started offering this feed, and it was consumed by an application written by an external developer, and RIOC subsequently changed the feed rendering the developer's application useless, RIOC could (and should) disclaim responsibility for providing translations to older formats. 

Software being software, the developer could easily write his or her own translation to maintain the usefulness of the data for that particular app and distribute the update to the app's users.

It's clearly not RIOC's job to predict or manage the consequences of every single eventuality associated with every potential use of a data feed. A well-designed developer program could define and maintain a reasonable scope.

The kind of technical fear-mongering demonstrated in your post only serves to give technophobic government bureaucrats an excuse for their complacency. Let the nerds have the data, and let the nerds figure out how to use it and how to deal with future complications.

Frank Farance said...

It's not fear-mongering but normal professional practice to consider the engineering, software, management, policy, etc. concerns in the requirements and planning phases and *before* implementation.  Because you've used the term "scope creep", I'm guessing you already knew that.

You say "The example you gave is willfully convoluted, not to mention unlikely", yet in my experience these kinds of concerns have arisen in virtually every real-time time-series data source in transportation, military, communications, financial markets, etc..  Besides, even if it were convoluted, it is good management and engineering practice to consider the possibilities to make informed decisions -- I presume you dislike programmers who don't check their arguments to functions/methods, right?

As for "Software being software, the developer could easily write his or her own
translation to maintain the usefulness of the data for that particular
app and distribute the update to the app's users", that's a naive software lifecycle strategy: one that assumes code is free, it can be developed immediately, testing/QA is free and quick, and distribution is effortless -- even with open source, those aren't good assumptions (e.g., Firefox and Thunderbird).  For production code, typically, the maintenance cost exceeds the development cost by an order of magnitude.

Your statement of "Let the nerds have the data, and let the nerds figure out how to use it and how to deal with future complications" is blithely optimistic.  I heard Google took this approach with Department of Labor's Bureau of Labor Statistics' (BLS) data to determine Unemployment Rate and other data products.  Google couldn't get it right because: you need people that understand the data to interpret and process the data, not just nerds and computers thrown at the problem.  Sounds like the same kind of hype of the Semantic Web.

By the way, I am one of those nerds who writes code on both the producer and consumer sides.  For us nerds that think about data seriously, the issues I raise are real concerns, especially if you're a government entity that enshrines its operations in policies and business processes.

As for RIOC, collaboration seems to work better when we put ourselves in their shoes and try to understand their perspective.  I note that WE are much more receptive to RIOC when they put themselves in OUR shoes and understand OUR perspective (and we encourage them to do so).

If you compare budgets, BART is a $600+ million transportation agency that runs a railroad among other things.  RIOC is a $18 million housing development agency that runs a tram and a handful of buses in a one-horse (one-street) town.  I make that comparison not to diminish RIOC, but to point out that RIOC has a very tiny department for information and communication technologies as compared to BART.  So coming up with a Developer Program, registrations, and such, while possible, would have significant costs/efforts for RIOC, whereas for BART I'm guessing a similar program is just a drop in the bucket.  I'm not saying it can't be done for RIOC, but I think you should have expectations in line with what is possible and reasonable for RIOC.

My comment was not a written to give RIOC an excuse to not solve the problem, my comment was intended to identify some of the boundaries of the problem, to identify some of the key decision points, and leave it open for areas discussion and compromise -- and maybe some progress with RIOC.  You've pointed out that developers might have to live with a disclaimer that the data might change (I agree), and that developers might have to register (a good idea because it can put limits on the data pipe required).