Big Data and IoT in Sports: Forecast Come True

2015-10-22 By knowlengr Leave a Comment

In a blog post written in January 2014 at Syncsort.com (“Big Game, Big Data: How Football is Being Transformed by Big Data”) I forecast that Big Data and the Internet of Things would eventually impact major sports in the U.S. In a feature story written for CIO magazine by Thor Olavsrud (@ThorOlavsrud) in September 2015, parts of this forecast may becoming reality for the National Football League.

Question: How will it affect your bets in fantasy sports? IBM Watson for unstructured expert advice? QlikView and Tableau for analytics?

Image credit: Dave Sizer, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

Computer-Assisted Instruction: Long Overdue in Overburdened STEM Classrooms

2014-10-11 By knowlengr Leave a Comment

This weekend the NYT is running a compassionately told story about challenges facing teachers in traditional classroom settings. I truly felt for the novice teachers.

On the other hand, having been classrooms with unruly students — especially with STEM-heavy curricula — I felt that even the tricks, tactics and cajoling described here would often fail with some of the children. A continual return to behavior management, while necessary, is nothing short of a content pause for attentive students. Worse, in the absence of computer-assisted learning, the ability of a single teacher to track learning needs for component skills of, say 25 students — inevitably different from learner to learner — is spotty at best.

Teachers are implicitly asked to supply this missing computation in the form of homework grading, longer hours, self-produced content. What sort of person would want to accept such an assignment – with some nurtured while others just muddle through with at best intermittent attention to both subject matter & discipline? While the NYT reporter makes a good case for the importance of behavior management in the classroom, the narrative begs the question as to what works in behavior management and whether educational psychology pedagogy in teacher training is up the task.

By analogy, what is it like to work in a factory where a constant number of products shipped are known to be flawed?

One can hardly blame teachers for a system that begins more as child care than instruction. That it remains so for classrooms with older students here speaks to the Sisyphusian nature of the endeavor.

Image Credit: Scientific American story from 1978 provided by Steve Eskow in 2009.

Book Review: Business Storytelling for Dummies

2014-02-05 By knowlengr Leave a Comment

Homer’s Odyssey was preserved through an oral tradition of storytelling. NPR listeners are familiar with StoryCorps, a show that features some of the 45,000 interviews recorded by the organization of the same name, and The Moth, another show that features unscripted storytelling.

At the opposite end of this humane tradition is “Death by PowerPoint.” This critique of the slide-based software, so named by Angela Garber, argues that PPT decks are often boring, oversimplified, needlessly stylized, overly complex – or all of the above.

Somewhere in the middle, perhaps, is ordinary business communication, instructional materials and well-tolerated military briefings. After all, David Byrne has shown in his PowerPoint-based art that the software medium itself may not be at fault.

So it is with PowerPoint in mind that I supposed that Dietz and Silverman’s Business Storytelling Business might be a useful mate (or antidote) to the book used for reference when the Microsoft PPT help screens fall short. Business Storytelling is written in a breezy style that belies the importance of this topic. Nonetheless, Dietz and Silverman have crafted a book that satisfied my goal.

I only own two “Dummies” texts, and the other one was printed in 1999, but the brand’s format hasn’t changed much despite the transition to Wiley. The text is strongly edited, by which I mean it follows a somewhat principled structure and doesn’t stray from its pragmatic intention for long.

A Must-Have for PowerPoint Mavens and TED Talk Wannabes

A psychological / linguistic term for storytelling is Narratology: “the branch of knowledge or literary criticism that deals with the structure and function of narrative and its themes, conventions, and symbols.” Business Storytelling doesn’t cover narratology in depth (and fails to mention Roger Schank). While there are relevant and useful peer-reviewed references, they’re sprinkled throughout the text and interspersed with blog posts, anecdotes, books (e.g., Nancy Duarte’s Resonate: Present Visual Stories that Transform Audiences) and case studies. This is not a failing given the book’s stated audience and objectives, but the suggestions made would have been strengthened by an academic-style bibliography.

Offsetting this lack are numerous story examples – both good and bad – which illustrate the authors’ recommendations. Here are a few that may demonstrate the flavor of the book:

Identify a setting, characters, events
Build in empathy for the main character
Ways to include data in a story
Problems with insider and technical terminology
Create multiple layers for a story to reverberate
Ways to make a story memorable
How to demonstrate that the storyteller is simultaneously an effective listener

The book uses cartoon-style iconography for tips, pitfalls, realistic examples, external references, points to remember. Some may find this style off-putting, but if you can get past the idiomatic style, it does improve skim speed, which is what’s needed to rework and groom a story – and a PPT deck. Refer to additional worthwhile content at the book’s web site.

The business of creating business stories is non-trivial. Some of the unpersuasive examples provided in the book make this clear. Good storytelling calls for multiple talents: creativity, engaging diction, a balance of the simple and the complex, good aesthetic sense, practice, a sort of storytelling erudition – and, above all, metacognition of the domain that extends beyond the vanishing point of the story. Good storytellers may do all this instinctively, but, as a good fiction writer will readily attest, a cookbook will not result in a PowerPoint rose as lovely as Faulkner’s “Rose for Emily.”

You can see a TEDx talk by Dietz in which she introduces some of these concepts.

Feedspot: RSS Reader for Google Reader Diaspora

2013-07-27 By knowlengr 2 Comments

There was plenty of hand-wringing when Google announced that it was ceasing support for Google Reader. As is somewhat typical with Google’s project kills, it was a relatively precipitous decision that had analysts scratching their heads and users scurrying for alternatives.

Feedspot is one of several browser-based Really Simple Syndication (RSS) readers that offers features similar to Google Reader, and — this was critical for many users — includes Outline Processor Markup Language (OPML) import. OPML was the format used to export RSS feeds from Google Reader, as well as for other applications, including the venerable Microsoft Outlook.

As one of the refugees from Google Reader, I had many feeds spanning several engagements since 2005, I had a modest set of hierarchically structured feeds. Some of the folders were quite deep — e.g., “MusicTech” had 77 feeds, while others had just a few. I was a fairly energetic user of folders for structuring as well, and it is easy to imagine that others had more extensive lists. In total, there were around 750 RSS feeds. These were successfully imported into Feedspot in around 20 minutes in the middle of the day EST (US).

Feedspot has the feel of a relatively mature application. The small touches are part of this impression, such as the “tooltips.” An example is shown in the screenshot: “Unread article: A blue triangle in the top left corner indicates unread article. . .” Other examples include the list-view toggle and the rightmost “Feedback” button leading to a UserVoice dialog. Feedspot also allows building of RSS collections — essentially, feeds of feeds, at the level of folders.

feedspot-feedback-1 — UserVoice Feedback

A so-called “modern” application must integrate numerous sharing opportunities, and Feedspot follows this trend. Features for sharing content are provided in several spots [sic]. Following of user collections is possible (“follow all of my stuff”) and the obvious — sharing of individual articles.

There are still some rough edges here and there. It was unclear how to place a new manually entered feed, place it into a folder, or how to create more than a 2 level hierarchy if that is supported.

RSS content is sufficiently important that I have placed my post-Reader bets on more than one pony. Feedspot is one of them. It deserves serious consideration as a Google Reader replacement.

Founder on the Spot

I corresponded with FeedSpot founder Agarwal about the Company. He disclosed that FeedSpot was built using agile methods, and that it runs on a LAMP stack. His business goal? He says he always wanted to build “a consumer internet product.” Why RSS? Because, he wrote, “for some users an RSS reader is a must-have product.” But Agarwal believes it is possible to “take an RSS reader to mainstream consumers.”

Future of RSS?

The cancellation of Google Reader had some pundits predicting the end of RSS. Some consider the Facebook-dominated landscape as superceding the lowly RSS. The RSS button, they say, is losing ground to “Follow” and “Like” buttons. Indeed, there is a rich future ahead for the underlying data from those interactions. Still, it safe to say that pronouncements of the death of RSS are premature. RSS is widely used in content management systems (e.g., Blogger, WordPress, Sharepoint and Joomla) to import links to relevant articles. This is true for externally produced content, of course, but perhaps it has even greater value as a rapid information aggregator for smaller scale intranets whose limited staffing prevents more sophisticated schemes.

RSS, partly because of its UserLand roots, offers a simple but flexible framework for information management. It doesn’t deliver a directly usable ontology from an automation standpoint, but its wide adoption gives users great leverage to employ ad hoc schemes.

feedspot-uservoice-feedback-1-350px — UserVoice Dialog Box

The addition of tagging features in Feedspot strengthens the quasi-ontology capability in RSS. If judiciously used with a controlled vocabulary, search can be more fruitful. As longtime users of Gmail can attest, having to choose between folders and tagging was an uncomfortable either/or decision. Eventually Gmail would offer both for classifying email. Having this capability for managing RSS is helpful.

There is more fertile ground in the reference community, such as RSS feeds that are supported in ResearchGate, CiteULike and academic publishers. If feed schemas become more sophisticated, feed users could deploy them more rapidly. Content publishing based on feeds would more often reach appropriate reader communities. Content delivery at present is notoriously hit-and-miss, especially since bloggers can readily veer off the ostensible topic of the blog.

These are growing pains that have made the semantic web grow so unsteadily. But Feedspot and like tools are perfectly good hammers for the right kind of nail.

feedspot-showing-share-and-tag-circle — Tagging Feature

[schema type=”review” url=”http://knowlengr.com/blog-type/feedspot-rss-reader-for-google-reader-diaspora/” name=”Feedspot: RSS Reader for Google Reader Diaspora” description=”Blog site fror Krypton Brothers founder Mark Underwood. He blogs under the name “knowlengr” (knowledge engineer).” rev_name=”Feedspot” rev_body=”A review of the RSS reader Feedspot” author=”Mark Underwood (“knowlengr”)” pubdate=”2013-07-27″ ]

Celebrity’s Anonymous Pen Name ‘Outted’ by Software

2013-07-18 By knowlengr Leave a Comment

JGAAP (Java Graphical Authorship Attribution Program)

The role that software plays in stylistic analysis of text is perhaps less surprising to high school and college students than to the general public. The former must submit essays they write to style analysis performed by software which looks for plagiarism and sometimes also makes quality assessments.

In the recent outing of J.K. Rowling as the writer behind the pen name Robert Galbraith, it was mentioned that software had been used to analyze the text of the Galbraith novel. There exists a family of software used by academics for “authorship attribution,” e.g., to discover, for example, whether a recently discovered manuscript was a missing chapter of Don Quijote (a fabricated example). One of these applications is JGAAP, for Java Graphical Authorship Attribution Program. The JGAAP wiki page explains the project as

. . . Java-based, modular, program for textual analysis, text categorization, and authorship attribution i.e. stylometry / textometry. JGAAP is intended to tackle two different problems, firstly to allow people unfamiliar with machine learning and quantitative analysis the ability to use cutting edge techniques on their text based stylometry / textometry problems, and secondly to act as a framework for testing and comparing the effectiveness of different analytic techniques’ performance on text analysis quickly and easily. JGAAP is developed by the Evaluating Variation in Language Laboratory (EVL Lab) and released under the AGPLv3.

How this was accomplished was explained by one of two academic investigators credited with the analysis (along with some suspicions by reports at the Sunday Times) at . Patrick Juola, in the blog Language Log. Juola refers to this subdiscipline as “forensic stylography.”

A one-paragraph extract from Juola’s blog post follows. Note that, in the usual sense of the word, the analysis doesn’t look directly at “meaning.”

The heart of this analysis, of course, is in the details of the word “compared.” Compared what, specifically, and how, specifically. I actually ran four separate types of analyses focusing on four different linguistic variables. While anything can in theory be an informative variable, my work focuses on variables that are easy to compute and that generate a lot of data from a given passage of language. One variable that I used, for example, is the distribution of word lengths. Each novel has a lot of words, each word has a length, and so one can get a robust vector of <X>% of the words in this document have exactly <Y> letters. Using a distance formula (for the mathematically minded, I used the normalized cosine distance formula instead of the more traditional Euclidean distance you remember from high school), I was able to get a measurement of similarity, with 0.0 being identity and progressively higher numbers being greater dissimilarity.

Cool Socnet Visualization from MIT’s Immersion Project

2013-07-09 By knowlengr Leave a Comment

A previous post considered some practical implications for privacy and government surveillance stemming from the Snowden revelations about the Prism program. The point was made that some people who think they have nothing to hide could easily become ensnared in webs not of their own making, and could find it difficult to untangle themselves.

Interest in metadata patterns in social networks is not limited to the NSA. Prism is one of a number academic, Homeland Security and Department of Defense programs that have studied how to make sense of social communication patterns to identify and track suspects. One of these is MIT’s Immersion project.

Following a tip from Slashdot, the Immersion project was given the keys to the author’s hyperactive Gmail account (~ inbox = 169,000, 120 filters, 250 labels). Immersion analyzes a Gmail account without directly accessing one’s Gmail password.

The attached images were produced by Immersion after analyzing 277,843 emails. As the MIT project team explains,

Once you log in, Immersion will use only the From, To, Cc and Timestamp fields of the emails in the account you are signing in with. It will not access the subject or the body content of any of your emails.

The point? As Slashdot’s “Judgecorp” points out, Immersion gives even a casual observer a sense for what the NSA Prism initiative could do with metadata.

Immersion can also objectively respond to your Mother’s “Why don’t you ever write?” complaint. When used to analyze a single contact, Immersion produces a graph of interactions by year. Also depicted in the screenshots is a plot of the interactions by year.

Yes, writing my sister more often would be a good idea.

As often highlighted at GlitchReporter.com, things in information technology can sometimes go wrong. Spam, misaddressed email, malware or sheer coincidence could put your name on the receiving end of an arrow in an Immersion diagram.

mit-immersion-project-snapshot-20130709-sidebar

mit-immersion-project-snapshot-20130709-v2-no-labels

First posted at Port Wash Patch.

Bush the Elder’s “Vision Thing”

2013-05-15 By knowlengr Leave a Comment

A colleague suggested a TED talk by Simon Simek on “leadership.” Can any talk or book about “leadership” be credible?

I am suspicious of someone who casually proposes that humans are motivated “by biology not psychology.” As if these could be cleanly partitioned off from one another.

I can perhaps overlook that oversimplification.

But most organizations “believe” many things. Concurrence of employee/vendor teams, if it could be measured, would surely cut across many beliefs and ideas. It would be difficult to prove that what motivates people is a directly causative to success of a given enterprise. Being motivated can lead to good as well as bad results. There are good and bad, successful and unsuccessful visions that can be communicated (or mis-communicated) to prospective cult members. Many a startup with great vision, collective commitment, and focus on “why,” not just “what” — will fail to make the cut.

Inspirational, powerful rhetoric is great (and its absence is painful), but show me what Simek in his talk disparagingly refers to as “the 12 point plan,” too. A core principle in understanding how people operate, I believe, is the notion that knowledge, and the pursuit of it in an enterprise, is intersubjective. That means, at some level, distrusting not only the expressed beliefs of others, but one’s own instincts to believe.

Maybe Simek it simply reiterating what Bush the Elder was said to have commented about “the vision thing.” Give it its due, but no more.

Will $100M Trickle Watson Down to SMB Enterprises?

2011-06-02 By knowlengr Leave a Comment

Bloomberg News reported that IBM plans to invest an additional $100 million in its Watson technology. Earlier in 2011, Watson exceeded previously unmet expectations for artificial intelligence by easily overwhelming two Jeopardy!champions on national TV. While Watson-like technologies could be used in a variety of settings (e.g., network management or health care), the steep investments IBM has already made suggest that global services giant has its eye on a revenue stream whose major tributaries are large enterprises: Proctor and Gamble, Pfizer, ExxonMobil, JPMorgan Chase.

ArnoldIT’s April Holmes put it this way:

IBM has a Tundra truck stuffed with business intelligence, statistics, and analytics tools [SPSS, InfoSphere Streams and Cognos come to mind – ed.] IBM has no product. IBM . . . has an opportunity to charge big bucks to assemble these components into a system that makes customers wheeze, “No one ever got fired for buying IBM.”

Promising but out of reach? Few have been fired for asking, “Can we afford IBM?” In a recentTechnology Review interview, IBM Analytics head Chid Apte admitted that “This technology will form the basis of a new product we will in the future be able to offer all of IBM’s big customers.”

The reasons for the anticipated cost are readily apparent. It has been widely reported that Watson took four years to build, runs on around 2,800 Power7 processor cores, has 15 terabytes of main memory, can operate at 80 teraflops (80 trillion operations per second), and employs IBM’s SONAS file system with a capacity of 21 terabytes. Watson software components included some familiar open source technologies IBM had already adopted elsewhere, such as Eclipse and Apache Hadoop, but new ground was broken in creating a natural language understanding system tailored to perform in the Jeopardy! question and answer format. The cost for that capability alone was considerable.

IBM believes this revenue stream will be substantial. According to the Bloomberg article, IBM projects $16B from “business analytics and optimization.” This estimate is probably not unfounded. A 2011 IBM-sponsored study of 3,000 CIO’s reportedly found that 4 out of 5 executives indicated that applying analytics to IT operations was part of their “strategic growth plans.”

But what are the prospects for small and medium sized enterprises (SMB’s)? Large data warehouses are not only associated with large enterprises. Small firms – even a one-person consultancy — can easily amass huge quantities of data, and may be even more highly motivated to make sense of that data. However, they are unlikely to have Watson-scale budgets.

Still, there are a few possible scenarios in which Watson technology could reach SMB’s:

Cloud-based Watson resources, with cost reductions made possible by scale (a la Google search), could become more widely available
“Watson Light” — Restricted vocabularies and data sources, possibly sold through IBM partners
Bundling of certain Watson components with existing, more affordable IBM products
A la carte offerings, such as the CRM-integrated “Next Best Action” recommender systems envisioned by Forrester’s James Kobielus
Industry-specific offerings in which the raw Watson capabilities are harnessed behind the scenes by IBM specialists

The challenge of providing robust hardware and software capabilities to collect, host and access large scale data warehouses using Watson-like technologies is not a near term possibility for smaller enterprises. It should be remembered that existing natural language technologies, such as the highly effective speech recognition technology Microsoft seamlessly integrated into Vista and Windows 7, have not been widely adopted, even though for many types of human-computer interactions, it is an efficient and easy to use technology. Other obstacles await earlier adopters: problems of data quality, provenance, standardization, consensus building for metadata, and dealing with special scalability problems such as DR and privacy concerns. Early adopters may rely on third party specialists to pull many of the levers.

Nevertheless, some steps can be taken by SMB’s to lay a foundation for the Watson Era.

Identify the most high-payoff opportunities, then refine enterprise-specific use cases to match
Develop canonical, standardized systems for metadata and taxonomies
Leverage existing standards while monitoring current work on evolving standards
Develop small, prototype projects using current technologies to assess where payoffs are likely to be for your organization (e.g., low cost experiments with Hadoop or similar technologies)
Include nontraditional sources, such as email, web traffic, internal and external documents and project management artifacts
Begin to address data quality and provenance by improving internal processes and assigning metrics (even if initially manual)
Plan for scaling out warehouses several orders of magnitude beyond current forecasts
Collaborate with other groups, especially within industry-specific subcommunities
Be on the lookout for template-based “blueprints” that work for industry-specific needs (e.g., subscription-based businesses with periodic renewals, or importers whose margins depend greatly upon shipping costs, etc.)
Through internal education, networking, consultants and recruitment, improve staff capabilities and awareness

Watson technologies are a force to be reckoned with. Just when they will make themselves felt in the marketplace is still guesswork, but savvy early adopters will likely seize opportunities that won’t be so easy to pluck later in the adoption curve.