Wednesday, July 13, 2016

How to Set Up Python for Minecraft...

Most of the online tutorials on this topic seem weak, so here is my step-by-step beginner's tutorial on how to set up your Windows PC so that you can start writing Python programs for Minecraft.

Step #1:  Download and install Python (use any version after 3.5.0).  During installation, make sure to check the box to "Add Python to Path".

Step #2:  Download and install the Minecraft Python API and Spigot.  Go to this link and download the file: "Minecraft Tools.zip".  Extract all of the files, then double-click the file: "Install_API".

Step #3:  Run the Spigot Server.  Open the "Minecraft Tools" folder (that you extracted in Step #2), then double-click the file: "Start_Server".  Without closing the Spigot window, open Minecraft and choose "Multiplayer", then "Add Server".  In the server address box, type "localhost", then "Done".  Double-click the server name listed and your game should start.
  • If you have a compatibility issue when you double-click the server name, exit back to the Minecraft launcher and create a new profile. Edit the settings for that profile by selecting an older version release (for example, 1.9.4) 
You're now totally set up.  In the future, when you want to write the code for your programs, use IDLE (a text editor installed when you installed Python).  In order to execute your finished programs, make sure to run the Spigot server and Minecraft, and simply click "F5" within IDLE.

Some actual programming examples are coming soon.



  

Thursday, July 07, 2016

Using Tableau for Political Visualizations...

If you're curious about how Data Science is being used to analyze political data, look no further than Tableau - an easy-to-use tool for converting analytics into fabulous visualizations.

Recently, Tableau held a Politics Viz Contest to showcase some of the work their users are generating.  The contest winner, Robert Rouse, used data from a Pew Research Center report to visualize increasing political polarization in the U.S. over the past few decades (Click on the image to enlarge and see the details):

https://www.interworks.com/blog/rrouse/2016/06/24/politics-viz-contest-plotting-political-polarization


Or here is one that I personally found interesting - a visualization of how each state has voted in partisan terms since 1964:

https://public.tableau.com/s/gallery/republican-democrat-voting-1946-2012

Or this alternative visualization of state-by-state partisanship in every election since 1916:


https://public.tableau.com/s/gallery/presidential-election-results-1916-2012


Really, if the conjunction of Data Science and Politics is your jam, then it's worth checking out the entire Tableau gallery to see what people are doing in the field.


  

Tuesday, June 28, 2016

Brexit Voters Were Not Actually Googling "What is the EU?" After They Voted...

The day after the historic "Brexit" vote, where British citizens voted in a public referendum to leave the European Union, a headline from the Washington Post circulated wildly on social media:  "The British Are Frantically Googling What the E.U. Is, Hours After Voting To Leave It".

This was used to ridicule the validity of the results, insinuating that British voters had no idea what they were even voting for or against.  However, the headline is completely disingenuous.

Here's why.  The Washington Post reporter used Google Trends as the source of his information, and noted that the number of searches for the phrase "What happens if we leave the EU" had more than tripled in the hours after polls had closed.  But Google Trends reports search numbers in relative terms and within the date range and context of other trends.

So, for example, the "250% spike" in searches for the E.U. "in the past hour" only means that the number of searches more than tripled relative to the number of searches in the previous hours.  More importantly, it says absolutely nothing about the total number of searches actually made.

What this means is that a very small number of people conducting their searches can lead to Google Trends (or, more accurately, a journalist using Google Trends) completely misrepresenting how many people are doing those searches.  For instance, if 10 people had searched that phrase in the previous hour, and now 25 people searched that phrase in the current hour, then Google Trends would note that it was a 250% increase.  Although, clearly, 25 people hardly represent the entire U.K. with its tens of millions of people.

And, in fact, this is what happened the other night.  As Remy Smith uncovered, in the month leading up to the vote, that phrase was receiving about 261 searches per day.  Even after tripling, that still means that fewer than 1,000 individuals actually googled “What is the EU” in response to the “Leave” victory. And that is hardly enough to conclude that large swaths of the population were generally uninformed.

As if any more context were needed, consider that more people googled "Game of Thrones" that night more than "What is the EU?".  By a lot.


  

Thursday, May 05, 2016

Results for the Presidential 'Facebook Primary'...

Now that Donald Trump is officially the presumptive Republican nominee and Hillary Clinton is basically, if not yet officially, the same on the Democratic side, it might be instructive to save, for posterity's sake, the results of Nate Silver's "Facebook Primary" data.

First, let's look at the Democratic campaign.  In terms of raw Facebook "likes", Bernie Sanders beats Hillary Clinton nationwide by a nearly 3-to-1 margin.  Yet Hillary Clinton is currently leading Bernie Sanders in the popular vote by more than 3 million votes and a 56%-42% margin.

Number of raw Facebook "likes" for each candidate, by county (Sanders in purple, Clinton in green)...


Another interesting metric is which candidates perform best on Facebook by county relative to their national share of "likes".  In other words, this map shows each candidates' strongholds of support (Sanders in purple, Clinton in green)...


Next, let's look at the Republican campaign, where, in terms of raw Facebook "likes", Donald Trump has more than all other candidates combined (Trump in orange, Cruz in red, and no other candidate registers enough likes to be on the map)...


And, again, here's a map showing each candidates' strongholds of support (Trump in orange, Cruz in red, Kasich in purple)...


The million-dollar question, of course, is to what extent Facebook can be a predictor of electoral success.  Based on the numbers, it hasn't been a very good predictor in the Democratic race, but has been a good one in the Republican race.  That's a ratio of 1:2, or a 50% rate of accuracy.  You don't need to be a statistician to see how 50% accuracy in predicting elections essentially means that it's not accurate at all.


  

Monday, March 21, 2016

CUNY's Women in Technology Initiative...

We all hope that on some rare occasion we will be fortunate enough to work on something more meaningful, more impactful, than the day-to-day routine aspects of our jobs.  This past January I was fortunate enough to be given the opportunity to co-teach a pilot course with Professor Zachary Dodds that aimed to tackle the challenge of why there are so few female students in Computer Science, and then set out to do something about it.  This effort has now been officially launched as the Women in Technology and Entrepreneurship in New York Initiative.

Please check out the good work the initiative is pursuing at the link above, and check out the video embedded here - you'll not only watch video testimonials from our former students (who represent themselves extremely well, if I may say so), but you'll also catch a fleeting glimpse of yours truly giving a lecture.

It is an honor to be part of this program, and it's the type of thing that I'm grateful that my kids will one day see that I was a part of.




  

Thursday, March 17, 2016

Who is Anonymous Fighting For in Their War Against Donald Trump?

The hacker group Anonymous has declared "total war" on Donald Trump, urging hackers to expose personal information about him and his staff and to disable various websites associated with him including Trump.com, DonaldJTrump.com, CitizensForTrump.com, TrumpChicago.com, etc.

Even though #OpTrump mentions April 1st as its target date, the collective has already posted unverified personal information about the candidate and his staff, including social security numbers.  Trump was also previously targeted by Anonymous last December following his comments about banning Muslims from entering the country.

The reason for the "call-to-arms", as stated in the Anonymous YouTube video, is that “Your inconsistent and hateful campaign has not only shocked the United States of America, you have shocked the entire planet with your appalling actions and ideals".

This action shouldn't surprise anyone because this is basically what Anonymous does.  What's interesting is that this time, rather than go after targets affiliated with the Ku Klux Klan or ISIS, or even Mastercard or Visa, as they have done in the past, they're going after an American presidential candidate who's currently in the process of winning the nomination based on the popular vote.

You don't need to be a Trump supporter to take issue with the fact that, whereas in the past Anonymous has chosen institutional targets like multi-national corporations or terrorist organizations, here they are going after an individual who is being democratically elected.  Rather than being an agent of The People, as they like to perceive themselves, they are actually fighting against what The People are voting for (on the Republican side).

Is Anonymous exposing itself as ideologically liberal, or perhaps even outright partisan?  If they seek to fight and resist the valid results of democratic elections then Anonymous is now showing itself to be anti-democratic in addition to anti-capitalist and anti-free speech (at least, anti-any-speech-that-they-disagree-with).  If that's the case, can someone please explain who exactly they are fighting for?



  

Thursday, March 03, 2016

The Python Elections Library and Other Programming Tools for Election Analysis...


During this campaign season, many political scientists are conducting research experiments that use online datasets as their primary units of analysis.  For instance, in this blog's previous post, real-time Twitter data was collected and then analyzed to get a sense of popular sentiment about Chris Christie's endorsement of Donald Trump.  Scraping data from social media sites is an increasingly common focus of statistical research.

To this end, there are numerous programming tools available to political scientists.  Let me start by saying that, personally, my language of choice for tasks of this sort has long been Java, although this year I've migrated over to Python.  Meanwhile, Web APIs have been around for years so that researchers can easily access the data on Twitter, Facebook, Google, Reddit, etc. Also, for this 2016 election cycle there are a few notable new additions to our collective toolkit that, combined with some useful already-existing tools, really improve a researcher's capabilities.  Here's what I'm using...

  • Tweepy - still the best Python-based library for accessing the Twitter API.
  • Alchemy and NLTK - the most common APIs for language sentiment analysis.
  • The Python Elections Library - a pay-for library that provides access to all federal, state, and local election results, as well as delegate estimates for the presidential nomination contests.
  • Elex - the Associated Press' brand new command-line tool for accessing current election data.
  • The Watson Emotion Analysis API - IBM's brand new API that was just released in Beta. Whereas Alchemy analyzes language sentiment as positive, negative, or neutral, the Emotion Analysis API detects joy, fear, anger, sadness, and disgust, and rates them by order of magnitude.
  • Matplotlib - the common Python library for visualizing the data with charts and graphs (although I'm currently searching for a better visualization tool, if you have recommendations).

These tools are a great starting point for collecting and analyzing social media data.  Now if only you didn't have to pay a fee for some of these large datasets, or for a sentiment analyses of them, then we'd really be cooking.


  

Friday, February 26, 2016

What Do Twitter Users Think of Trump Being Endorsed by Christie?

Just an hour or so ago, New Jersey Governor Chris Christie publicly endorsed Donald Trump for president.  This is turning a lot of heads in pundit circles, however I wanted to just do a quick analysis of what people were saying about the endorsement on Twitter.

Running a python script of my own creation, I scraped 949 tweets referring to Trump over the course of a single minute (approximately an hour after the public endorsement).  So this is neither a scientific nor a meaningfully large sample size by any means; rather, I just wanted to get a quick snapshot.  All 949 tweets were then run through IBM's Alchemy engine for sentiment analysis so that we could get a sense of whether tweets about Trump were singing his praises, condemning him, or staking out a more neutral stance.  Here are the data collected:

  • Number of tweets analyzed:  949
  • Number of positive tweets:  254
  • Number of negative tweets:  343
  • Number of neutral tweets:  305

Which can be visualized as follows:


Just some real-time API-driven sentiment for your consumption  :-)

  

Sunday, February 21, 2016

Apple Challenges the Federal Court Ruling That It Must Create New Unlocking Software...


Yes, Apple and the federal government are at war with each other. In case you missed it, this week a federal court ordered Apple to unlock the iPhone of one of the San Bernardino terrorists who killed 14 people, and Apple is strongly challenging that ruling. The reasons why strike at the very heart of, not only digital privacy rights (which is how many are framing it), but also what constitutional checks will there be on the power of the federal government moving forward.

The center of this controversy is not whether the court can compel Apple to unlock an iPhone to provide user data to law enforcement authorities (it's legally clear that they can with a subpoena, and Apple has already done so with all of the San Bernardino shooter's data stored on iCloud); but the real issue is whether the court can compel Apple to create new software.

It comes across as absurd that the stated legal basis for the court's ruling is the All Writs Act of 1789, which is used to require people or businesses not involved in a case to execute court orders.  I mean, come on.  Using a law from 1789 to judge a case on digital encryption and software development severely damages the ruling's credibility in the public's eye, to say the least.  One could almost feel the palpable collective eyeroll most Americans had upon hearing this.

The fact is, despite the All Writs Act of 1789, the Supreme Court has ruled that "the government cannot compel a third-party that is not involved in a crime to assist law enforcement if doing so would place 'unreasonable burdens' on it".  Forcing Apple to create brand new software seems to be just such an unreasonable burden.

Of even more direct relevance is the case Bernstein v. Department of Justice, where a college student challenged a federal law that classified strong encryption tools as a form of arms munitions, and therefore forced him to register as an arms dealer in order to publish his encryption algorithm.  In 1999, a judge ruled that the government cannot regulate cryptographic "software and related devices and technologies" because doing so would violate First Amendment protections of free speech on the grounds of prior restraint.

All of Silicon Valley as well as the hacktivist community at-large has come out in support of Apple's legal challenge.  To what extent should the government be able to force third-party companies or individuals to, not just hand over subpoenaed information, but actually be forced to create new products to assist the government in carrying out its duties?  We all want to be safe and assisting law enforcement in preventing attacks is clearly a public good.  Again, Apple has already done that by handing over all of the data on the shooter they had access to.  But a requirement to force a company to create new software (which, by the way, would be the equivalent of forcing the company to make an inferior product because it would turn all iOS products into less secure devices and, by doing so, would severely damage the company's commercial reputation)?  The line has to be drawn somewhere.


  

Monday, February 08, 2016

When Hacktivism Should Receive First Amendment Protection...

I recently came across an article by Noah C. N. Hampson titled, "Hacktivism: A New Breed of Protest in a Networked World", in the Boston College International and Comparative Law Review.  It's a great entry piece for anyone looking to become acquainted with the issue of hacktivism in relation to the First Amendment.

Hampson's main argument is that some forms of hacktivism are primarily expressive, not destructive, and that these acts sufficiently resemble traditional forms of protest enough to warrant protection from anti-hacking laws under widely accepted principles of Free Speech.  Specifically, he calls out the Computer Fraud and Abuse Act in the U.S. for being so generally worded as to be dangerous in its potentially overly broad application.

He goes on to make a several noteworthy comments:
  1. That "time, place, and manner" restrictions on the Internet are unclear.  These are the laws that allow the authorities to determine when and how protests can occur (knowing they cannot censor or ban them altogether) in order to maintain public safety.

  2. The "public forum doctrine" on the Internet is also unclear.  This generally protects speech in "places by which long tradition or by government fiat have been devoted to assembly or debate".

  3. The distinction between public versus private spaces on the Internet is often blurred.  Again, this relates to the different bodies of law that have developed over time regarding the regulation of speech when it occurs on public versus private property.

  4. It does not follow that if ANY harm is caused, then the act should be considered criminal;  some costs (inconveniences/annoyances) must be tolerated as the price for freedom of expression.

These are all important points to raise.  I would like to contribute the observation that all of the first three points are really about the public versus private distinction (or lack thereof).  The "time, place, and manner" of a hacktivist protest cannot be physically relocated the way a planned protest at Madison Square Garden in New York City can be relocated by the authorities to Riverside Park.  On the Internet, that physical dimension is meaningless, and the hacktivist target, by definition, is a server that is most likely a private server, rather than a public sidewalk.  The question is: since the overwhelming majority of cyberspatial activities occurs on private servers, to what extent can First Amendment protections that traditionally apply to public spaces still be valid?  For if hacktivism is deemed to only have First Amendment protections when applied to strictly public spaces, you might as well throw out the notion of Free Speech on the Internet altogether.

Which brings us to Hampson's second point which is a foundational question:  To what extent is the Internet a public or a private space at all?  Yes, its functional definition is that it is a network of mostly private networks, which would indicate that it ought to be legally considered a private space in the majority of cases.  However, even though I may connect to the Internet through my private ISP and then my traffic or request for data might at some point traverse the private backbone of a provider like Cogent, all so that I can access my account on one of Facebook's privately owned servers in Silicon Valley, the sum is often considered greater than its parts when it comes to understanding how this thing called the Internet works.  When I engage in cyberspace, I'm typically using only private resources to do what I do, yet there's an argument to be made, and certainly an enormous public perception, that many of these publicly accessible forums are indeed public spaces even if the resources that facilitate them are not.  The First Amendment applies differently to public spaces than to private spaces.  So we might be in need of legal clarification as to the Internet's hybrid status before we can determine its applicability to hacktivism.

Finally, as to Hampson's fourth point listed above, I'm inclined to disagree with his argument that even some harmful acts of hacktivism should warrant First Amendment protection.  Anti-hacking laws such as the CFAA are designed to prevent unlawful intrusions, destruction, and theft.  Yes, perhaps they are too generally worded and in need of revision.  But I'm unsure of what type of "harm" exactly that Hampson is referring to that ought to be tolerated.  He discusses website defacements, denial-of-service attacks that effectively shut down target websites, and other hacktivist tactics, although even after reading the article I find myself confused as to why he believes some of these are "harmful" while others are not.  What test should be applied?

Anyway, this is terrific food for thought.  I'm left questioning the public-private debate and how best to define "harm" in a hacktivist context.  All First Amendment considerations will stem from these starting assumptions.


      

    Sunday, January 24, 2016

    Yet Another "Has the Bitcoin Experiment Failed?" Post...

    Bitcoin is facing an existential crisis.  Two weeks ago, in one of the most stunning op-eds by an insider you might ever come across, Mike Hearn announced his resignation from the Bitcoin project and stated how he has come to, in no uncertain terms, "the now inescapable conclusion that [Bitcoin] has failed".

    Past tense.

    On the surface, it might seem that the problem is the 1 MB block size limit, a technical measure which affects the number and speed of transactions that Bitcoin can be handle.  However, that can be changed.  The real issue, as Hearn sees it, is "people problems".

    There are currently two competing visions for Bitcoin - one in which Bitcoin expands its commercial potential and gains mainstream usage; another in which it challenges the existing global financial system.  Or, as Fred Wilson told Business Insider, "Should Bitcoin be Gold or should Bitcoin be Visa? If it is Gold, it’s a store of wealth and something to peg value to. If it is Visa, then its a transactional network that can move wealth around the globe in a nanosecond."

    Hearn describes that the camp in favor of raising the block size limit, known as the Bitcoin XT camp, which includes himself, has been the target of cyberattacks in recent months and that their views have been censored on online discussion boards.

    He also points out that "the block chain is controlled by Chinese miners, just two of whom control more than 50% of the hash power. At a recent conference over 95% of hashing power was controlled by a handful of guys sitting on a single stage...  Bitcoin has no future whilst it’s controlled by fewer than 10 people".  This, of course, begs the question: What has happened to the vision of a decentralized currency?

    Meanwhile, in the storm of fury that came in reaction to Hearn's revelatory post, conspiracy theories have begun circulating across the Internet highlighting the fact that Hearn has since accepted a job at R3 - a startup backed by some of the biggest financial institutions around, including Morgan Stanley, Goldman Sachs, JP Morgan, and UBS.  There is a clear insinuation that this entire episode is "some sort of banker conspiracy".

    Meanwhile, not be be considered an afterthought, the value of Bitcoins plummeted on the very day Hearn's post was published, but has since began a gradual recovery:




    In the week-and-a-half since, much has been written about Bitcoin's demise, and even death.  This is probably an overreaction.  As former Wall Street Journal reporter Daniel Palmer summed it up, the big idea of a "software-driven governance system" is very much alive, however the possibility of Bitcoin becoming a mass-adopted currency is dead, and it simply won't become "a replacement for the dollar" anytime soon.

    That's the analysis that rings most true here as well.  As this existential dispute among Bitcoin's core developers comes to light, it would seem foolish for anyone to start trading in their dollars or euros for satoshis, but it would also seem equally foolish to completely disregard the fact that Wall Street firms have seriously begun adopting the blockchain approach to transaction-handling in clearinghouses and elsewhere.

    The idea still has a bright future, even if my CoinBase account does not.

      

    Saturday, January 16, 2016

    A Few Takeaways from My Reddit AMA...

    Last week I had the opportunity to do an AMA (Ask Me Anything) spotlight on Reddit in the /r/Politics subsection.  Here is a link to the entire transcript.  Now that a few days have passed, here are some observations for anyone who may be considering something similar going forward.
    • The participating audience asked many well-informed intelligent questions and the replies were generally respectful and civil in tone.  I think people's biggest fear from doing such online Q&As is the fear of trolls and having the dialogue break down towards the least common denominator.  That was not the experience here.  Reddit's userbase quite impressed me, actually.
    • "Ask Me Anything" should be taken literally.  The very first question I was asked was personal and straight to the point: "Who is your favorite presidential candidate right now?".  Indeed, the entire line of questioning ranged from identifying my personal politics, to where I work, to my positions on different issues, and much more.  Which was fine.  But anyone thinking that they'd be able to stick to a script of some type - academic, professional, etc. - and be evasive in the face of personal questions should reconsider whether an AMA is a good fit for them.
    • Don't expect any monetization.  Admittedly, I was originally motivated to do the AMA as a way of marketing my book.  I made sure to follow the site's recommendation not to push your sales pitch too hard, but I still included my book's title in the AMA title and also posted a link to my blog.  In the end, despite the /r/Politics subreddit having over 3 million subscribers, several thousand active online during the AMA, the AMA receiving 194 points and 178 comments, it all resulted in just one book sold through Amazon.  On the other hand, my blog did roughly triple its usual amount of traffic for the week and I acquired a healthy number of new subscribers. The takeaway here: AMAs are good for expanding your audience, but not so much for sales.
    Just some food for thought.  But, overall, it was a surprisingly positive experience.


      

    Wednesday, November 25, 2015

    A Facebook Voting Model for Predicting Elections...

    The race has been on for the past several years for scholars to figure out exactly how to best use social media metrics to help predict electoral outcomes.  Matthew MacWilliams is the most recent entrant into the fray.  He recently published an article in Political Science and Politics in which he proposes a new model for forecasting elections incorporating Facebook data - specifically, the number of Facebook "Likes" a candidate's page receives as well as Facebook's "People Talking About This" (PTAT) statistic.

    For those of you who may be skeptical of a purely Internet effect on voting behavior, these were not the only variables used.  Rather, MacWilliams factored in the Facebook participation variable along with other well-established metrics that also track partisanship and incumbency advantage.

    The results:  Testing the model against 15 of the most competitive Senate races in 2012, the Facebook model proved to more accurately predict the Senate winner in 6 of the 8 weeks leading up to Election Day than the fundamentals model that does not include Facebook data.  Additionally, it proved more accurate than "polls-of-polls" forecasts in 5 of those 8 weeks.

    So this new Facebook model seems great, right?  Well, while the research is solid and the experiment thoughtfully constructed, a few questions are dying to be answered.

    For starters, why is Facebook the sole focus of the study, and why only those two metrics?  I understand that Facebook data is methodologically more available and accessible than Big Data datasets, including Twitter, but that alone does not make Facebook the ideal choice for representing the dynamics of what's happening across all of social media.  Different sites have unique characteristics, and thus can skew results.  Granted, something had to be chosen, but ideally some cross-section sample of a number of social media sites might be more instructive.

    Second, when compared to the poll-of-poll averages, the Facebook model was better at forecasting in 5 of the 8 weeks.  Yes, that's more accurate, but 5-3 is barely so, and it's not exactly a resounding confirmation.  Also, as MacWilliams himself points out, the Facebook model was better at forecasting outcomes the farther the prediction was from Election Day.  In other words, the closer to Election Day, the less accurate it was.  WHY?  That's an extremely interesting footnote - and surprisingly counter-intuitive considering that one would expect the online metric to more accurately reflect people's sentiments in real-time.

    Finally, the article closes by mentioning its "grand prize" goal:  to be useful in predicting presidential election outcomes.  However, to me it may be more interesting to determine how the Facebook model might hold up when applied to state/local elections?  Because of lower voter turnout and less engagement, generally, in local elections, I could almost make the argument that measuring Facebook "Likes" and the number of "People Talking About This" could either confirm the model's accuracy or completely invalidate it.  On the local level, it seems like it could go either way, which is why it would be great to see an experiment done on that front.

    Overall, this research study was well done in a subfield where more experiments desperately need to be conducted.  For those readers out there not completely sold on the two Facebook metrics used, what alternative social media metrics do you think might be better?



      

    Wednesday, October 28, 2015

    Who Will Write the Rules for Data Privacy? The End of Safe Harbor...

    A few weeks ago the European Court of Justice ruled that the Safe Harbor Agreement between the United States and the European Union was invalid. The ramifications of this are still being felt.

    Safe Harbor has enabled the Googles and Facebooks of the digital world to conduct business on a global scale.  Digital companies based in the U.S. have to comply with very different privacy laws in the U.S. (where privacy laws are weak) as well as in the E.U. (where they are strong).  Since these firms view it as extremely undesirable to re-create how their businesses function in every locality in which they want to operate, the Safe Harbor Agreement instead let U.S. companies voluntarily sign on to providing Safe Harbor protections in exchange for what amounted to automatic pre-approval to do business in all E.U. nations.

    In the U.S., Safe Harbor was seen as a boon for American businesses; in the E.U., it was seen almost immediately after passage as a threat to individual privacy.

    What's interesting about the ECJ's ruling is that Safe Harbor wasn't overturned based on the old criticism of the Agreement relying on the voluntary compliance of companies. Rather, the ruling was based on the threat of U.S. government surveillance.

    Daniel Solove of George Washington University writes that "the main reason for the invalidity of Safe Harbor is the failure of US law to provide adequate limitations and redress from government surveillance, especially NSA surveillance.  In particular, the ECJ was troubled by the fact the NSA could engage in massive surveillance and that US courts had failed to provide a way for people to challenge that surveillance [in court]...  Essentially, the ECJ held that because the NSA's surveillance is virtually unstoppable, the Safe Habor cannot guarantee an adequate level of protection."

    This is more monumental than most people probably realize.  In the short term, about 4500 U.S. firms have had to scramble to adjust how their digital businesses operate.  But they will adjust.  The longer-term question, however, looms quite large:  Who will write the rules for data privacy? 

    Pressure stemming from the ECJ decision is now forcing American firms to comply with far stronger privacy protections than from what they're used to operating under in U.S. law.  And there may be little they can do about it if they want continued access to European markets.

    In this race-to-the-top, the E.U. is now effectively creating large swaths of U.S. data privacy law.

    Solove is right to note that "The costs of NSA surveillance keep mounting" and that "U.S. companies have no love for the NSA or the weak legal protections against government data gathering -- it erodes the trust companies are building with consumers".  Firms like Microsoft and Google have been especially vocal on this front.

    In an attempt to end on a more hopeful note, Solove states that maybe, in the aftermath of Safe Harbor's demise, companies might now have the "the leverage and incentive to convince policymakers to better regulate government surveillance".  That would be nice.  But the bigger issue remains whether, in the global digital economy, the anything-goes culture of the virtually-non-existent privacy laws of the U.S. have any chance of enduring while much of the rest of the world dissents.


      

    Thursday, October 08, 2015

    My New Book, "Who Governs the Internet? A Political Architecture", Is Now Available!

     My book, "Who Governs the Internet? A Political Architecture" is finally being published next week. It will be available on Amazon's and Barnes & Noble's websites (links are below), but as a special benefit for all Nerfherder readers who are interested in purchasing a copy, go to the Lexington Books website instead and you can use my "Author" discount code, which will bring the price down to $56. Yes, I know that may seem pretty expensive for a book - I have no say in how the publishing company prices it, and they assure me that since the book's primary marketing target is academic libraries, rather than individuals, this is how it's done. Regardless, I appreciate the support :-)

    It's available right now!

      • (use the author discount code to save $24: LEX30AUTH15) - this is valid until 12/31/15

      

    Thursday, August 27, 2015

    Can Google Rig Democratic Elections? Have They Already?

    What influences voters is a central question in Political Science.  It is widely accepted that, to some extent, people's votes are influenced by the media, family, friends, income, education, and much more.  But last week, research psychologist Robert Epstein wrote a controversial piece in Politico detailing how Google trumps them all and could outright "rig" the 2016 election.  He boldly declares, "the Search Engine Manipulation Effect (SEME) turns out to be one of the largest behavioral effects ever discovered" and that it is "a serious threat to the democratic system of government."

    Based on data collected in a research study, he asserts that Google's search algorithm - the way Google decides in what order to rank search results for a given term - can easily shift the voting preferences of undecided voters by 20% percent or more, and even up to 80% in some demographic groups - "with virtually no one knowing they are being manipulated".

    His logic is based largely on the fact that 50% of the time Google users only click on the first two results, and that 90% of the time they never click beyond the first page of results.  Therefore, if someone searched for the term, "Chris Christie", for example, whether the algorithm listed on the first page negative stories about the Bridgegate scandal or positive stories about New Jersey's improved budget during Christie's tenure as governor, this would influence undecided voters by 20% or more.  And surely, 20% would be enough to swing an election in a candidate's favor.

    Epstein even suggests that if campaigns stopped flooding the airwaves with media blitzes that cost a fortune in the weeks before an election, and instead focused simply on finding "the right person at Google" who would tweak the algorithm their way, that would have far more of an effect in turning swing voters.

    Afraid yet?  That seems to be the point.  But there are a host of reasons why not to take this conspiracy theory for face value.

    First, the point that Google's search algorithm is influential is virtually indisputable, but so what?  Google has for a long time now acted as a gatekeeper, or filter, for what information people ultimately access.  It is not a censor of content, but its rankings basically function as one.  However, it is a huge leap to conclude that "the right person at Google" could decide "which candidate is best for us" and "fiddle with search rankings accordingly".  Outlandish as this may seem, Epstein himself states that this is a "credible scenario under which Google could easily be flipping elections worldwide as you read this".

    It doesn't get much more conspiracy-minded than that.

    For their part, Google senior vice president Amit Singhal responded directly to Epstein's allegations, stating:
    There is absolutely no truth to Epstein’s hypothesis that Google could work secretly to influence election outcomes. Google has never ever re-ranked search results on any topic (including elections) to manipulate user sentiment. Moreover, we do not make any ranking tweaks that are specific to elections or political candidates.

    Second, there is the not-so-small matter of causality.  Epstein suggests that undecided voters click on the first few links about a candidate and make their decision who to vote for based on what they see.  However, what he overlooks is that the opposite is also true.  As people surf the web, and blog, and tweet, and link to stories about candidates, they are the ones determining which links will come up first in the search results.  In other words, people are influencing the algorithm as much as, if not more than, the algorithm is influencing them.

    In this way, many political activists across the ideological spectrum have long sought to game Google's algorithm to their preferred candidate's advantage.  But vocal activists trying to influence people's votes is hardly a new phenomenon and, when they succeed, it counters the notion of top-down algorithmic control.

    Third, Epstein's study recorded undecided voters' preferences after they were exposed to stories listed in search rankings.  But how much staying power did those preferences have?  It would be instructive to know how those undecided voters actually voted on Election Day, and not just who they said they intended to vote for when Election Day eventually rolled around.

    Fourth, there is the completely unfounded argument that "Google’s search algorithm, propelled by user activity, has been determining the outcomes of close elections worldwide for years".  That is flatly absurd, unless it is qualified with a statement that so has television, radio, newspapers, and virtually every other form of modern media.

    Following that thread, use your own judgment to evaluate Epstein's claim that "it's possible that Google decided the winner of the [2014] Indian election. Google’s own daily data on election-related search activity... showed that Narendra Modi, the ultimate winner, outscored his rivals in search activity by more than 25 percent for sixty-one consecutive days before the final votes were cast. That high volume of search activity could easily have been generated by higher search rankings for Modi."

    Is anyone convinced by that?  Isn't it at least possible that higher search activity actually caused the higher search rankings?  (Hint - that's the way the search algorithm actually works.)

    Again, no one disputes the influence that Google search rankings have on a whole host of topics.  This is why an entire industry called SEO has come into existence - to game the algorithm for marketing purposes. And, yes, political campaigns have tried, and will continue to try, to game the algorithm to their preferred candidate's benefit.  However, charging Google executives with "rigging" elections (Epstein's wording, not mine) is grossly irresponsible.


      

    Tuesday, August 11, 2015

    How to Turn Your Old Computer into a Web Server (for free)...

    Years ago, I published a paper titled, "The Configuration and Deployment of Residential Web Servers".  In retrospect, it wasn't the sexiest title, but the idea remains as relevant today as it was then.  For the Internet to remain open and embody democratic values, power needs to be decentralized.  For anyone looking to do something proactive about it, one easy (and free) way of doing so is to turn your beat-up old computer into a fully functional Web server;  the idea being that hosting Web content yourself means that others have less control over what get published.

    So, in a brief attempt to update my old paper, here are the necessary steps for turning your old computer into a Web server...

    1. Download this MSI file which contains the Apache Web Server software.

    2. Double-click the MSI file to begin installing with the wizard.  Keep all of the defaults.  This should install the Apache Web Server to your "C:\Program Files" directory.

    3. Open the folder "C:/Program Files/Apache Software Foundation/Apache2.4/conf".  Open the file named "httpd.conf" in Notepad.  Scroll down and make sure that the following line is included (change the path to wherever you saved Apache if necessary):  

      • DocumentRoot "C:/Program Files/Apache Software Foundation/Apache2.4/htdocs"

    That should be it.  To test it, open up your web browser (Firefox, Chrome, etc.) and go to the following URL:  http://127.0.0.1.  That should bring you to a web page that simply says, "It Works!".  You are looking at the "index.html" file inside of your Apache/htdocs folder.

    To actually share the contents of the Apache/htdocs folder on the Web, you need to setup Dynamic DNS.  Long story short, because your home ISP changes your IP address frequently, you'll need to download some free software, called a DUC client, to update it automatically.  I recommend using NoIP.com.  Then you'll need to setup Port Forwarding on your router by opening up port 80.

    Any files you save to the Apache/htdocs folder will now be immediately published on the Web.  Not only do you have total control over any websites that you want to create, but you are also serving a higher democratic purpose.


      

    Tuesday, July 21, 2015

    The Courts Will Decide the Fate of Uber and the Gig Economy. For Better or Worse.

    It's impossible to miss all of the news coverage this week related to Uber.  New York City Mayor Bill de Blasio is supporting legislation that will cap new-hire licenses for Uber drivers.  Nearly all of the presidential candidates are also chiming in with their view of whether Uber represents a model example of entrepreneurship or an example of how deregulation leads to wage stagnation hurting middle-class workers.

    At the heart of the debate are these two fundamentally transformative questions:  Are the people who work for Uber employees or independent contractors?  Does Uber, the company, simply put a tool out there and act as a middle-man, or do drivers functionally work for them?

    The relevant statistic:  according to the New York Times, over 160,000 Americans depend on Uber for at least part of their livelihood.  The company directly employs fewer than 4,000 of them.

    This is the culmination of a phenomenon known as "the gig economy".  As more individuals have moved toward freelancing, contracting, or temping work, the result has been an existential shift in the nature of employment itself.  1099s are becoming nearly as common as W2s.

    The argument in favor of the gig economy is that it fosters entrepreneurship and innovation.  It circumvents entrenched interests and monopolistic arrangements, like that of the medallion taxis in New York City.  It offers more opportunities for people to make money, while reducing the role of regulations and bureaucracy that might act as a roadblock.

    On the other hand, the argument against the gig economy is that it leads to income stagnation and economic insecurity for the middle-class.  Re-classifying workers as independent contractors, critics argue, is simply a way for companies to avoid offering workers the protections and benefits they are entitled to under the law, such as abiding by the minimum wage and offering workers compensation for injuries suffered on the job. 

    The legal classification between employees and independent contractors is, and ought to be, determined by the courts.  The key question  that courts must answer is "whether a worker is economically dependent on the employer or in business for him or herself".  The courts typically use a six-part test revolving around the following types of questions:

    1. Is the work being performed an integral part of the employer's business?
    2. How much control does the employer exert over workers?
    3. Is the relationship between the two parties permanent or open-ended?

    Where do things currently stand?  Thus far, the conflict is being resolved primarily at the state and local level.  Just last month, the California Labor Commission ruled that an Uber driver was, indeed, an employee deserving of a variety of workplace protections and was not, as the company maintained, and independent contractor.  And, of course, New York City will decide the fate of de Blasio's proposed caps this week.

    However, the larger point here is that the implications of the gig economy go far beyond Uber.  For example, in 2009, the Labor Department sued Cascom for misclassifying workers as independent contractors, and a judge ruled against the company in 2011, awarding nearly $1.5 million in back wages and damages to roughly 250 workers.  In another example, just last month, FedEx agreed to pay $228 million to settle a class-action lawsuit brought by truck drivers who also challenged their classification as independent contractors.

    How "employee" will be defined in the gig economy will have major ramifications for every business model in the years to come - from WalMart carpet installers, to New Media and citizen journalists, to Amazon affiliates, to even the most popular everyday Facebook or Twitter users posting revenue-generating content on social media sites.

    In this new frontier, the hope is that, however the courts ultimately decide, they decide soon.  Some detailed guidelines would be immensely helpful.



      

    Thursday, July 09, 2015

    The Reddit Shutdown: Model Cyber Protest or Temper Tantrum?

    Last Friday approximately 300 discussion forums on Reddit were shut down by their moderators in a show of protest against the firing of Reddit employee, Victoria Taylor, the company's director of talent.  Some of the details of this story remain shrouded in mystery - most notably, the reasons for why Ms. Taylor was fired, as well as why moderators thought she was so valuable in the first place.  The stated reasons don't seem overly compelling:  that she "coordinated high-profile forums...  would walk participants through the basics of using Reddit, create verified accounts for them to use, and help them introduce themselves to the community".

    As for the more specific reasons behind the protest, the volunteer moderators first posted a document online that asked for better communication with official staff, as well as improved software tools for community management.  Then, yesterday, two of these moderators published an op-ed in the New York Times explaining "Why We Shut Down Reddit's 'Ask Me Anything' Forum".  In it, they describe their "anger at the way the company routinely demands that the volunteers and community accept major changes that reduce our efficiency and increase our workload", "a long pattern of insisting the community and the moderators do more with less", and their desire "to communicate to the relatively tone-deaf company leaders that the pattern of removing tools and failing to improve available tools to the community at large, not merely the moderators, was an affront to the people who use the site".

    Reddit's CEO, Ellen Pao, apologized for not informing the community.  Meanwhile, all of the subreddit forums are back online.

    Should the rest of us care?  On the one hand, because of Reddit's 160 million regular monthly visitors, this is a cyber protest with high visibility and, arguably, impact.  The volunteer moderators expressed their voice effectively in communicating their discontent to their corporate overseers, and did so, publicly, through collective action.  As far as cyber protests go, that's fairly significant.

    On the other hand, Ms. Taylor is still fired and, only a few days later, the subreddits are cruising along as if nothing ever happened.  The practical effect of the cyber protest has been simply to get an apology and to publicly complain about "having to do more with less".

    Unless I'm missing something, that's not exactly the sign of the apocalypse.



      

    Tuesday, July 07, 2015

    The Emerging Bitcoin Governance Regime...

    As someone long immersed in the study of Internet governance, I often find it striking how similar the discussions and activities are surrounding another supposedly "ungovernable" phenomenon...  Bitcoins and alternative cryptocurrencies.

    The Bitcoin system, like the Internet, has a highly decentralized architecture, and this is by design.  But also similar to the Internet, being decentralized is not the same thing as being in a state of anarchy.  Certain clearly identifiable stakeholders have influence in shaping Bitcoin's usage and development, and others even have a demonstrable authority to constrain or enable behavior with intentional effects.

    The open source model has a long-established tradition of decisions being made by "rough consensus".  With Bitcoins, there are three different types of consensus that are all necessary - consensus about rules, consensus about history, and consensus that coins have value.  Because the blockchain at the heart of Bitcoin is based so strongly on distributed copies of transaction histories, any new changes to the system must acquire a rough consensus among the Bitcoin community in order for the new changes to be adopted by others and interoperable with the rest of the currency system.

    The rough consensus model, then, is the primary way in which decisions get made regarding the technology.  Policies are built into the code itself.

    However, human beings play a large role as well.  Bitcoin Core is software licensed under the MIT open source license, and is the de-facto rulebook of Bitcoin.  So the question for Political Scientists is:  Who exactly is writing the rulebook?

    Officially, anyone can contribute new rules, or ideas for technical improvements, via "pull requests" to Bitcoin Core.  Anyone can formally submit a new Bitcoin Improvement Proposal (BIP) and advocate for their proposal to be adopted, which occurs when it gets published in the numbered Bitcoin Improvement Proposal series.

    In reality, there are a small handful of individuals who have far more policymaking authority than others.  There are currently five developers who maintain Bitcoin Core:  Gavin Andresen, Jeff Garzik, Gregory Maxwell, Wladimir J. van der Laan, and Pieter Wuille.  These are the people who "hold the pen" of the Bitcoin rulebook.  Any rule changes that they make to the code will get shipped in Bitcoin Core and will be followed by default.

    Beyond the Core developers, formal institutions have begun to play a larger role in Bitcoin governance as well.  The Bitcoin Foundation is a nonprofit founded in 2012 whose main roles are 1) to help fund the Core developers out of the Foundation's assets, and 2) to act as "the voice of Bitcoin" while engaged in lobbying national governments around the world who increasingly seek to regulate Bitcoin activity.  Some of the Bitcoin Foundation's board members have been involved with criminal and/or financial troubles, and it remains an open question to what extent the Bitcoin Foundation actually represents the Bitcoin community at-large.

    All of which serves to illustrate just how much governance has already emerged in this supposedly "ungovernable" space.  Just as how the Internet's protocols, or "rules", are governed by the rough consensus model led by institutions like ISOC and the W3C, Bitcoin also has a clearly identifiable governance regime which makes decisions based on the rough consensus model, whose rulebook is the Bitcoin Core, and its rules are written by its five Core developers.  And although the role of formal institutions like the Bitcoin Foundation is still unclear, they are quickly becoming recognized as an integral part of the governance equation going forward.

    The bottom line is that, even in decentralized systems, rules are needed just to ensure basic functionality.  And where there are rules, there are rule-makers. 

    Meet the new boss, same as the old boss.