Feedspot: RSS Reader for Google Reader Diaspora

Feedspot: A New RSS Reader

Feedspot: A New RSS Reader

There was plenty of hand-wringing when Google announced that it was ceasing support for Google Reader. As is somewhat typical with Google’s project kills, it was a relatively precipitous decision that had analysts scratching their heads and users scurrying for alternatives.

Feedspot is one of several browser-based Really Simple Syndication (RSS) readers that offers features similar to Google Reader, and — this was critical for many users — includes Outline Processor Markup Language (OPML) import. OPML was the format used to export RSS feeds from Google Reader, as well as for other applications, including the venerable Microsoft Outlook.

As one of the refugees from Google Reader, I had many feeds spanning several engagements since 2005, I had a modest set of hierarchically structured feeds. Some of the folders were quite deep — e.g., “MusicTech” had 77 feeds, while others had just a few. I was a fairly energetic user of folders for structuring as well, and it is easy to imagine that others had more extensive lists. In total, there were around 750 RSS feeds. These were successfully imported into Feedspot in around 20 minutes in the middle of the day EST (US).

Feedspot has the feel of a relatively mature application. The small touches are part of this impression, such as the “tooltips.” An example is shown in the screenshot: “Unread article: A blue triangle in the top left corner indicates unread article. . .”  Other examples include the list-view toggle and the rightmost “Feedback” button leading to a UserVoice dialog. Feedspot also allows building of RSS collections — essentially, feeds of feeds, at the level of folders.


List View Toggle


UserVoice Feedback

A so-called “modern” application must integrate numerous sharing opportunities, and Feedspot follows this trend. Features for sharing content are provided in several spots [sic]. Following of user collections is possible (“follow all of my stuff”) and the obvious — sharing of individual articles.

There are still some rough edges here and there. It was unclear how to place a new manually entered feed, place it into a folder, or how to create more than a 2 level hierarchy if that is supported.

RSS content is sufficiently important that I have placed my post-Reader bets on more than one pony.  Feedspot is one of them. It deserves serious consideration as a Google Reader replacement.

Founder on the Spot

I corresponded with FeedSpot founder Agarwal about the Company. He disclosed that FeedSpot was built using agile methods, and that it runs on a LAMP stack. His business goal? He says he always wanted to build “a consumer internet product.”  Why RSS? Because, he wrote, “for some users an RSS reader is a must-have product.”  But Agarwal believes it is possible to “take an RSS reader to mainstream consumers.”

Future of RSS?

The cancellation of Google Reader had some pundits predicting the end of RSS. Some consider the Facebook-dominated landscape as superceding the lowly RSS. The RSS button, they say, is losing ground to “Follow” and “Like” buttons. Indeed, there is a rich future ahead for the underlying data from those interactions. Still, it safe to say that pronouncements of the death of RSS are premature.  RSS is widely used in content management systems (e.g., Blogger, WordPress, Sharepoint and Joomla) to import links to relevant articles. This is true for externally produced content, of course, but perhaps it has even greater value as a rapid information aggregator for smaller scale intranets whose limited staffing prevents more sophisticated schemes.

RSS, partly because of its UserLand roots,  offers a simple but flexible framework for information management. It doesn’t deliver a directly usable ontology from an automation standpoint, but its wide adoption gives users great leverage to employ ad hoc schemes.


UserVoice Dialog Box

The addition of tagging features in Feedspot strengthens the quasi-ontology capability in RSS. If judiciously used with a controlled vocabulary, search can be more fruitful. As longtime users of Gmail can attest, having to choose between folders and tagging was an uncomfortable either/or decision. Eventually Gmail would offer both for classifying email. Having this capability for managing RSS is helpful.

There is more fertile ground in the reference community, such as RSS feeds that are supported in ResearchGate, CiteULike and academic publishers. If feed schemas become more sophisticated, feed users could deploy them more rapidly. Content publishing based on feeds would more often reach appropriate reader communities. Content delivery at present is notoriously hit-and-miss, especially since bloggers can readily veer off the ostensible topic of the blog.

These are growing pains that have made the semantic web grow so unsteadily. But Feedspot and like tools are perfectly good hammers for the right kind of nail.


Tagging Feature

[schema type=”review” url=”https://knowlengr.com/blog-type/feedspot-rss-reader-for-google-reader-diaspora/” name=”Feedspot: RSS Reader for Google Reader Diaspora” description=”Blog site fror Krypton Brothers founder Mark Underwood. He blogs under the name “knowlengr” (knowledge engineer).” rev_name=”Feedspot” rev_body=”A review of the RSS reader Feedspot” author=”Mark Underwood (“knowlengr”)” pubdate=”2013-07-27″ ]


Analyzing the Beast that is Cybersecurity

Hyatt Regency walkway collapse (credit: Wikipedia Commons)

Hyatt Regency walkway collapse (credit: Wikipedia Commons)

What sort of beast is “cybersecurity” anyway?

Failure Analysis

Is it simply a variation of software failure?  According to this analysis, a security lapse is a software engineering failure, not technically different from an unintended “404” error or an “uncaught” exception.

Protection Analysis

Is it simply a failure to implement corrective measures? This analysis likens cybersecurity to physical security.  Facilities such as military bases or electric power plants are vulnerable targets. Rather than try to remove all the points of vulnerability, a virtual “layer” of physical security is drawn around the facility. The Department of Defense Physical Security Program provides a useful glimpse into this approach. Consider DoD 5200.08-R. A version last updated in 2009 is hosted by DTIC. To some extent, there is a reasonable analogy to protecting software.

Architecture Analysis

Is it a design failure? In architecture, it is not uncommon for architects to receive the blame for collapsed, or otherwise unsuccessful buildings. For instance, consider the failure of a walkway in the Kansas City Hyatt Regency hotel in 1981.  In this failure, 114 persons were killed, and initial blame settled on architects. A more nuanced view recognizes multiple sources of responsibility, including project sponsors, customers, auditors, and sometimes public officials and even politicians. This was the analysis made by one K. Bristol in a 1991 analysis of the Pruitt-Igoe towers project in St. Louis.

Regardless of which of these approaches is chosen, the relative contributions of alternative models for failure should be taken into account.  There is a tendency to focus excessively on the specific lapse (e.g., buffer underflow). Issues such as engineer training, IDEs, development frameworks, test environments and constraints imposed by sponsors and other stakeholders also deserve investigation, if not blame.

Celebrity’s Anonymous Pen Name ‘Outted’ by Software

JGAAP (Java Graphical Authorship Attribution Program)

JGAAP (Java Graphical Authorship Attribution Program)

The role that software plays in stylistic analysis of text is perhaps less surprising to high school and college students than to the general public. The former must submit essays they write to style analysis performed by software which looks for plagiarism and sometimes also makes quality assessments.

In the recent outing of J.K. Rowling as the writer behind the pen name Robert Galbraith, it was mentioned that software had been used to analyze the text of the Galbraith novel.  There exists a family of software used by academics for “authorship attribution,” e.g., to discover, for example, whether a recently discovered manuscript was a missing chapter of Don Quijote (a fabricated example). One of these applications is JGAAP, for Java Graphical Authorship Attribution Program. The JGAAP wiki page explains the project as

. . . Java-based, modular, program for textual analysis, text categorization, and authorship attribution i.e. stylometry / textometry. JGAAP is intended to tackle two different problems, firstly to allow people unfamiliar with machine learning and quantitative analysis the ability to use cutting edge techniques on their text based stylometry / textometry problems, and secondly to act as a framework for testing and comparing the effectiveness of different analytic techniques’ performance on text analysis quickly and easily. JGAAP is developed by the Evaluating Variation in Language Laboratory (EVL Lab) and released under the AGPLv3.

How this was accomplished was explained by one of two academic investigators credited with the analysis (along with some suspicions by reports at the Sunday Times) at . Patrick Juola, in the blog Language Log. Juola refers to this subdiscipline as “forensic stylography.”

A one-paragraph extract from Juola’s blog post follows. Note that, in the usual sense of the word, the analysis doesn’t look directly at “meaning.”

The heart of this analysis, of course, is in the details of the word “compared.” Compared what, specifically, and how, specifically. I actually ran four separate types of analyses focusing on four different linguistic variables. While anything can in theory be an informative variable, my work focuses on variables that are easy to compute and that generate a lot of data from a given passage of language. One variable that I used, for example, is the distribution of word lengths. Each novel has a lot of words, each word has a length, and so one can get a robust vector of <X>% of the words in this document have exactly <Y> letters. Using a distance formula (for the mathematically minded, I used the normalized cosine distance formula instead of the more traditional Euclidean distance you remember from high school), I was able to get a measurement of similarity, with 0.0 being identity and progressively higher numbers being greater dissimilarity.


Cool Socnet Visualization from MIT’s Immersion Project

A previous post considered some practical implications for privacy and government surveillance stemming from the Snowden revelations about the Prism program. The point was made that some people who think they have nothing to hide could easily become ensnared in webs not of their own making, and could find it difficult to untangle themselves.

Interest in metadata patterns in social networks is not limited to the NSA. Prism is one of a number academic, Homeland Security and Department of Defense programs that have studied how to make sense of social communication patterns to identify and track suspects. One of these is MIT’s Immersion project.

Following a tip from Slashdot,  the Immersion project was given the keys to the author’s hyperactive Gmail account (~ inbox = 169,000, 120 filters, 250 labels).  Immersion analyzes a Gmail account without directly accessing one’s Gmail password.

The attached images were produced by Immersion after analyzing 277,843 emails.  As the MIT project team explains,

Once you log in, Immersion will use only the From, To, Cc and Timestamp fields of the emails in the account you are signing in with. It will not access the subject or the body content of any of your emails.

The point? As Slashdot’s “Judgecorp” points out, Immersion gives even a casual observer a sense for what the NSA Prism initiative could do with metadata.

Immersion can also objectively respond to your Mother’s “Why don’t you ever write?” complaint. When used to analyze a single contact, Immersion produces a graph of interactions by year. Also depicted in the screenshots is a plot of the interactions by year.

Yes, writing my sister more often would be a good idea.

As often highlighted at GlitchReporter.com, things in information technology can sometimes go wrong. Spam, misaddressed email, malware or sheer coincidence could put your name on the receiving end of an arrow in an Immersion diagram.

First posted at Port Wash Patch



Nothing to Hide? Or Afraid of a ‘Metadata Sweep’

FBI TSC Watch List flowchart

FBI Terrorist Watch List Flowchart

This post first appeared on the Port Washington Patch.

In a recent discussion of the Edward Snowden Affair with family members,  two basic attitudes toward the government’s selective spying on U.S. citizens emerged:

The Innocence Argument “I have nothing to hide, so I don’t care what the federal government wants to know about me.”

The Privacy Argument “The government should keep out of my personal life.”

The Fallibility Argument “The federal government’s systems can’t (yet) be trusted to avoid false positives and expeditiously remediate errors.”

Blogger Jeff Jonas noted that:

The underlying problem is that the information on these watch lists typically have low fidelity (i.e., limited data points like only name and date of birth).  If you want to see an example of a government watch list check out the Office of Foreign Asset Control’s Specially Designated Nationals Watch List.  You will find this frequently contains only a name, date of birth and place of birth.

Consider the case of Sean Kelly, who somehow found himself on the TSA watch list a few years ago. The TSA has since rolled out “Secure Flight,” but even a cursory glance at the system’s complexity and scale —  2 million passengers daily moving through 450 ports across the U.S. – instills a healthy skepticism that false positives can be avoided.

As the public debate over Snowden and PRISM rages on, consider the ways in which a citizen’s name could appear in a possible watch list data set:

  • A friend’s email list was corrupted by a spambot and you were sent an email from a person on the watch list
  • Your name was adjacent to a person on the watch list and a DHS analyst accidentally selected your record
  • Your name was misspelled in the government records• You used to live at an address once occupied by a person on the list• You have a common name
  • The software performing compiling the lists and/or extracting candidate metadata contains undetected bugs that have compromised data integrity (See GlitchReporter.com for examples)
  • A disgruntled insider within the government could scramble the underlying data, a problem which could remain undetected for months or even years
  • Recourse software, designed to give citizens an opportunity to appeal false positive classifications when disclosed, is inadequately tested
  • Across-the-board government cutbacks have affected program staffing understaffed and software supporting citizen recourse systems are no longer well maintained

Recently NPR’s This American Life chronicled the sequence of bureaucratic bumbling, auto-responders and inadequate supervision and training that apparently led to the beheading of an Iraqi national who had worked for a U.S. contractor.

Imagine that your name or account number appeared on a search of the metadata collected as part of the Prism program. Assuming you had recourse, consider the sort of correspondence needed to extricate yourself from the web of trouble in which you find yourself entangled. It is all too easy to imagine receiving messages from government agencies worded thusly:

Kindly be informed that we checked your case and found that it is in processing pending verifying your employment documents. Once it is completed we will move forward with your case. Your patience does assist us in accelerating the process.

The Orwellian message was repeated often, even after the Iraqi national had reportedly provided the verifications requested.

The Fallibility Argument isn’t a Paranoia Argument. It merely recognizes the limitations of systems created on this scale and run by a very large organization with unclear oversight. It can be assumed that some of the deficiencies have been corrected, but the Department of Justice Inspector General report on issues at the FBI’s Terrorist Watch Center is worth reading. After all, recent revelations about Prism indicate that there are “117,675 active surveillance targets.”

If a two year old toddler could end up on a list, it’s conceivable that the FBI’s data is telling them that one of those targets knows you.

Bush the Elder’s “Vision Thing”

A colleague suggested a TED talk by Simon Simek on “leadership.”  Can any talk or book about “leadership” be credible?
I am suspicious of someone who casually proposes that humans are motivated “by biology not psychology.” As if these could be cleanly partitioned off from one another.

I can perhaps overlook that oversimplification.

But most organizations “believe” many things. Concurrence of employee/vendor teams, if it could be measured, would surely cut across many beliefs and ideas. It would be difficult to prove that what motivates people is a directly causative to success of a given enterprise. Being motivated can lead to good as well as bad results. There are good and bad, successful and unsuccessful visions that can be communicated (or mis-communicated) to prospective cult members. Many a startup with great vision, collective commitment, and focus on “why,” not just “what” — will fail to make the cut.

Inspirational, powerful rhetoric is great (and its absence is painful), but show me what Simek in his talk disparagingly refers to as “the 12 point plan,” too. A core principle in understanding how people operate, I believe, is the notion that knowledge, and the pursuit of it in an enterprise, is intersubjective. That means, at some level, distrusting not only the expressed beliefs of others, but one’s own instincts to believe.

Maybe Simek it simply reiterating what Bush the Elder was said to have commented about “the vision thing.” Give it its due, but no more.

Recruiting #fail: On Recruiting for Proficiency


What follows is a position description received this month from a firm  — not a recruiter.

Required Technical Skills:

  • Proficiency in all MS Office applications including MS Project
  • Front end development (HTML, Flash, Ajax, Javascript – templates)
  • Back end development (XML, HTTPS, Web Services, Web dav, data mapping)
  • Experience with implementing and managing Demand Ware solutions a plus, Demandware Business Manager, DemandWare UX studio (Eclipse based development environment), DemandWare control center
  • Clear understanding of web technologies like Java, DotNet, PHP, Ruby, SQL, MYSQL, MSSQL, HTML 5, Javascript, IIS, Apache, Performance fine tuning techniques, Flash, AJAX, Mobile platform, CRM, Web services, XML
  • Understanding of Informatica, SAP, Biztalk is a plus

A piece of work, but not about getting work done.

Will $100M Trickle Watson Down to SMB Enterprises?

IBM Watson

Bloomberg News reported that IBM plans to invest an additional $100 million in its Watson technology. Earlier in 2011, Watson exceeded previously unmet expectations for artificial intelligence by easily overwhelming two Jeopardy!champions on national TV. While Watson-like technologies could be used in a variety of settings (e.g., network management or health care), the steep investments IBM has already made suggest that global services giant has its eye on a revenue stream whose major tributaries are large enterprises: Proctor and Gamble, Pfizer, ExxonMobil, JPMorgan Chase.

ArnoldIT’s April Holmes put it this way:

IBM has a Tundra truck stuffed with business intelligence, statistics, and analytics tools [SPSS, InfoSphere Streams and Cognos come to mind – ed.] IBM has no product. IBM . . . has an opportunity to charge big bucks to assemble these components into a system that makes customers wheeze, “No one ever got fired for buying IBM.”

Promising but out of reach? Few have been fired for asking, “Can we afford IBM?” In a recentTechnology Review interview, IBM Analytics head Chid Apte admitted that “This technology will form the basis of a new product we will in the future be able to offer all of IBM’s big customers.”

The reasons for the anticipated cost are readily apparent. It has been widely reported that Watson took four years to build, runs on around 2,800 Power7 processor cores, has 15 terabytes of main memory, can operate at 80 teraflops (80 trillion operations per second), and employs IBM’s SONAS file system with a capacity of 21 terabytes. Watson software components included some familiar open source technologies IBM had already adopted elsewhere, such as Eclipse and Apache Hadoop, but new ground was broken in creating a natural language understanding system tailored to perform in the Jeopardy! question and answer format. The cost for that capability alone was considerable.

IBM believes this revenue stream will be substantial. According to the Bloomberg article, IBM projects $16B from “business analytics and optimization.” This estimate is probably not unfounded. A 2011 IBM-sponsored study of 3,000 CIO’s reportedly found that 4 out of 5 executives indicated that applying analytics to IT operations was part of their “strategic growth plans.”

But what are the prospects for small and medium sized enterprises (SMB’s)? Large data warehouses are not only associated with large enterprises. Small firms – even a one-person consultancy — can easily amass huge quantities of data, and may be even more highly motivated to make sense of that data. However, they are unlikely to have Watson-scale budgets.

Still, there are a few possible scenarios in which Watson technology could reach SMB’s:

  • Cloud-based Watson resources, with cost reductions made possible by scale (a la Google search), could become more widely available
  • “Watson Light” — Restricted vocabularies and data sources, possibly sold through IBM partners
  • Bundling of certain Watson components with existing, more affordable IBM products
  • A la carte offerings, such as the CRM-integrated “Next Best Action” recommender systems envisioned by Forrester’s James Kobielus
  • Industry-specific offerings in which the raw Watson capabilities are harnessed behind the scenes by IBM specialists

The challenge of providing robust hardware and software capabilities to collect, host and access large scale data warehouses using Watson-like technologies is not a near term possibility for smaller enterprises. It should be remembered that existing natural language technologies, such as the highly effective speech recognition technology Microsoft seamlessly integrated into Vista and Windows 7,  have not been widely adopted, even though for many types of human-computer interactions, it is an efficient and easy to use technology. Other obstacles await earlier adopters: problems of data quality, provenance, standardization, consensus building for metadata, and dealing with special scalability problems such as DR and privacy concerns. Early adopters may rely on third party specialists to pull many of the levers.

Nevertheless, some steps can be taken by SMB’s to lay a foundation for the Watson Era.
  • Identify the most high-payoff opportunities, then refine enterprise-specific use cases to match
  • Develop canonical, standardized systems for metadata and taxonomies
  • Leverage existing standards while monitoring current work on evolving standards
  • Develop small, prototype projects using current technologies to assess where payoffs are likely to be for your organization (e.g., low cost experiments with Hadoop or similar technologies)
  • Include nontraditional sources, such as email, web traffic, internal and external documents and project management artifacts
  • Begin to address data quality and provenance by improving internal processes and assigning metrics (even if initially manual)
  • Plan for scaling out warehouses several orders of magnitude beyond current forecasts
  • Collaborate with other groups, especially within industry-specific subcommunities
  • Be on the lookout for template-based “blueprints” that work for industry-specific needs (e.g., subscription-based businesses with periodic renewals, or importers whose margins depend greatly upon shipping costs, etc.)
  • Through internal education, networking, consultants and recruitment, improve staff capabilities and awareness

Watson technologies are a force to be reckoned with. Just when they will make themselves felt in the marketplace is still guesswork, but savvy early adopters will likely seize opportunities that won’t be so easy to pluck later in the adoption curve.

Use (Corporate Knowledge) or Lose It

Danger Sidekick (credit Wikipedia Commons)

Danger Sidekick (credit Wikipedia Commons)

When a firm decides to shutter operations, the loss of knowledge capital in the form of talent should appear somewhere in the risk assessment. While significant short term savings may be achieved by closing a division (in the case of Microsoft, perhaps to save $$$ to purchase Skype?), one side effect can be a brain drain to bonanza to well-heeled competitors. A report from CNN Money today identifies several members of the original Danger (Sidekick) team who are now working at Google’s new innovation wing, “Android Hardware”:

Hershenson and Brit were part of the trio that founded Danger in 2000. The third partner: Android chief Andy Rubin. The three engineers launched pioneering consumer smartphones, like the once-ubiquitous-among-celebrities T-Mobile Sidekick in 2000.

Now all three are working for Google, perhaps with added incentive.

Following was my post to David Pogue’s NY Times story announcing the closing the Cisco’s Flip operation.

DP, you’ve got this mostly right, though I think there is a more disturbing back story that goes beyond this one. It’s the life cycle of smaller to medium sized technology firms whose founders and investors cash out by selling to a major (usually public) company. Another example that comes to mind is Microsoft’s killing off the Sidekick, another neat device paired with an even better cloud service to back it up. What’s gone is more than the idea — seen in its pre-acquisition form, these firms are living, breathing entities, with expert sales and marketing groups, engineers, an idea-makers. Listen up, politicians: THIS is the real “job growth,” not stringing fiber into empty office suites and hosting MS Office training classes for the unemployed. Killing off firms like Danger and Pure Digital aborts the creative offspring that their collective intelligence could manifest. A few among them will have cashed out, but most of those 550 workers will be consigned to endure a personal version of the Flip tragedy. Writ large, it’s the U.S. version of capitalism shooting itself in the foot just when job growth is needed most. Markets dump capital mainly into mega-firms like Cisco, whose far-flung, unwieldy enterprises are far less efficient at converting that cash into good ideas and jobs” (April 14, 2011).