music programming text book

OSdata tech blog
collecting social media for your website

OSdata RSS
News Feed

OSdata blog RSS.

OSdata blog RSS - add to Bloglines
OSdata blog RSS - add to Google Reader.
OSdata blog RSS - add to My AOL
OSdata blog RSS - add to Newsgator.

how to build a major motion picture movie studio tool

    This is an example of embarrassing technological ignorance by rich and powerful businessmen.

    Warner Brothers is looking to spend a bunch of money to invest in some company that will build a project that could easily be assigned as a newbie coding project for junior or senior level college students (or very bright sophmores or freshmen or even high school students).

    And yes, I am mocking the rich and powerful Warner Brothers executives for being unable to get this up and running in a couple of weeks, because this all existing technology and a single skilled professional working full time should be able to build this all quickly.

project summary

    Here is the key paragraph from the Los Angeles Times, Business section, Wednesday, June 18, 2014, “Speed Dating For Tech Start-Ups” by Paresh Save and Andrea Chang:

    “Warner Bros. hoped to track every Twitter and YourTube video that mentions one of its movies, and promote some of that content on the studio’s own website.”

my reply

random pict forwarded by a friend

    The expensive part of the project is the servers to collect, store, and evaluate the information. The programming and development could be done with home equipment that many American teenagers already own. The two most expensive parts are a working desktop (or laptop) computer and a smart phone.

    The typical middle class college student is already going to have the required equipment and a computer science major should be able to get the school to provide the server access in trade for college independent study credit.

    Over the next few months I will return to this topic and write some real working code for each part of this project and release it for free under Apache License 2.0. Don’t worry, I’ll only write the core portions of the task and leave the fun customization parts for each ambitious newbie to attempt on his or her own. If this was a full time job, I’d be embaraased to take that long, but I have to write the code and blogs in my spare time.

    The upcoming code (and, yes, I will be adding real working, fully tested code) will be in PHP and MySQL, but the principles should translate easily into your favorite scripting language and SQL.

developer accounts

    Both Twitter and YouTube have APIs for accessing their systems.

    The student will need to go to each site and sign up for an account (and the typical middle class student probably alrady has the accounts).

    Then upgrade the account to a developer account. This is an expensive step. Both Twitter and YouTube require that the developer be rich enough to own a mobile phone and have that mobile phone connected up to an active phone number. They don’t do this out of overt hatred for the poor, but rather as a lazy crporate method of confirming the identity of those who access their systems. But the requirement does have the unintended racist consequence of locking out many inner city youth. But, as Google proudly announced a month ago, their company hires predominantly White and Asian workers, who might not consider the negative consequences of their decisions on minorities and the poor.

    Google Workforce Demographics Tech 83% Men; Leadership 72% White; Overall 30% Asian; Tech 1% Black; Leadership 1% Hispanic; all categories less than 1% Other.

    Yes, I am a militant advocate of equal rights for everyone, even minorities, the poor, the elderly, the handicapped, gays and lesbians, and others that corporate America tries to ignore.

    Twitter: Twitter Developers

    YouTube: Google Developers


    Both Twitter and YouTube use OAuth to establish communications.

    OAuth is very deterministic. It is fairly straight forward to write the code to establish an OAuth connection. And there are a whole bunch of free open source APIs that can be used by lazy programmers.

    Twitter: OAuth FAQ

    YouTube: YouTube API v2.0 - OAuth 2.0 Authorization

search API

    Once you have your developer account and have successfully connected to it, the next step is to gather the information.

    YouTube has a clumsy method. You will have to make repeated searches on the appropriate keywords.

    YouTube: YouTube Data API (v3) NOTE: Be careful, Google’s own search engine will send you first to the deprecated version 2.0!

streaming API

    Twitter makes gathering the information easy with a streaming API.

    Twitter: The Streaming APIs

    You could gather the Twitter information through search requests using their REST API, but the streaming approach is going to collect the information much more efficiently and in real time.


    You will need to open a streaming connection on your server.

collecting the stream

    The Twitter stream will arrive as unordered JSON (JavaScript Object Notation): Processing streaming data

decouple collection and processing

    Because of the high volumes involved, it is important to decouple the collection and processing of the information. This means that one (or more) server(s) establish and collect the streams and place the results into a temporary store.

    Other servers are fed the information in the temporary store for actual processing. This can be handled by treating the collection server(s) as a really large buffer and having a simple process that farms out the data to the next available processing server.

    As an example, during the final one minute of the Brazil penalty-shootout win over Chile in the 2014 World Cup (soccer or football, depending on your nationality), there were 389,000 tweets (Twitter) about the game and a total of 16.4 million tweets throughout that day (Saturday, 28 June 2014) about the World Cup in general. At the peak of the Seattle Seahawks’ win over the Denver Broncos in the 2014 Super Bowl (American football), there were 382,000 tweets about the game and a total of 24.9 million tweets throughout the entire game.

    As you can see, for popular topics you should expect a very large stream of tweets data. You simply can’t take the time to process these tweets and still keep up with the incoming flow. You must separate collection from processing.


    You need to write a simple parser (or use a free open source API).

    You then need to filter out all of the unrelated tweets.

    And finally you need to store the information (aggregation).

    If you are gathering enough information (which might happen with the motion picture studio example), you need to use the map and reduce method. Split the job among multiple processes/servers and then combine the results.

    Just as a hint, you might seriously consider doing this processing with BASH. The combination of pipes and excellent text processing tools in the Unix/BASH or Linux/BASH combiantion might be exactly what you need.

data storage

    How much data are you really storing?

    It is entirely possible that you might be able to get away with a standard SQL program (such as PostgreSQL, MySQL, or even Oracle). If you need to go to something bigger (around 5-10 TeraBytes of information), there are literally dozens of NoSQL solutions available.

    If you can do the job with a simple SQL database, it will lower your costs and make the project easier to build and maintain.


    Now that you have parsed, filtered, and otherwise processed your incoming information, you need to make it available to human beings.

    You need reporting software that provides the business persons with aggregate information.

    And you will need a system that plucks selected messages and videos for automatic placement on the company website.

    When you look at the coding involved, you will see that these two different kinds of reporting have more in common than they are different. Although the output presentation is very different, both tasks will share much of the same code base.


    As you can see, there is real work involved here, but each step is clearly identifiable and involves techniques that are in some cases 50 or 60 years old.

    A simple version of this system that handles just one movie (or one song or one celebrity or one model of automobile or one fast food product) is something well within the capabilities of a bright college student or a small team of reasonably competent college students.

    Sad that the executives at Warner Brothers couldn’t get this job done in a couple of weeks.


    This comes under the category of owning criticism.

    This web page was banned at the Google+ community Web Developers, Web Designers, Web Coding by moderator +Joost Schuur (Nerdy product guy. Gamer. Touches code (usually in a good way), PlayMob, London, UK) because “You were promoting your own work and it wasn't really very tied to web development or design.”

    Moderator +Andrew Smith (Technology addict, Web Developer, API guru, futsal and football wannabe, and all round nice guy!, Swordfox Design, arrowtown, new zealand) added “Quite frankly this is self promotion, and your site is not of great quality. I can see how +Joost Schuur  would think this of not high enough a standard.”

Needless to say, I won’t be attempting to share newbie resources at that Google+ community anymore. Recent previous examples (and, now, this does count as promoting my own work): 404 page not found redirects (PHP), creating an object oriented website (PHP), adding a clock and timer in JavaScript, turning dynamic pages into static links (PHP). There are many excellent Google+ communities where my posts have actually been appreciated, including (note that some communities have the same name) Programming, JavaScript, PHP Programmers, Web Development, PHP Developers, Web Development, Shell/Bash Scripting, PHP Developers, and Entry Level Software Developers/Programmers.

    At the beginning of July 2014, reported that this web site had 385 EDU backlinks (that’s total number of referring educational domains in the U.S., not raw number of external backlinks). It does not indicate how many are from professors and how many are from some student account. +Joost Schuur’s has 2 EDU backlinks and +Andrew Smith’s Swordfox Design has a resounding ZERO EDU backlinks, so I am not too terribly concerned tabout their claims that I do a poor job of supporting education.

return to recent blog

comments, suggestions, corrections, criticisms

please contact us

your name:
email address:
phone number:

    If you spot an error in fact, grammar, syntax, or spelling, or a broken link, or have additional information, commentary, or constructive criticism, please contact Milo at PO Box 1361, Tustin, California, USA, 92781.

    Copyright © 2014 Milo.