PHPitfalls

After reading the “Securing PHP” article written James Cunningham I thought I might gather a few points about using PHP from the developer side. Keep in mind that I’m not a security expert. However, this article contains a few starting points on preventing exploits, making PHP apps perform better, and miscellaneous stuff that I consider to be best practices. Your mileage may (and probably will) vary, so as always: take everything with a grain of salt no matter if you read it here or elsewhere. This is not so much a check list of concrete actions, it’s more a collection of points worth keeping in mind as you code.

PHP? WTF possessed you?!?

The first thing you come across when starting out with PHP is probably the fact that it has an extremely bad reputation. You will hear lots of things, including “it doesn’t scale”, “it’s not a real language”, “it doesn’t have X so it sucks”, “it’s not safe”, or “it’s Blub and you’re too stupid to realize it’s Blub From the perspective of the language runtime itself, this is all a lot of crap. Still, the trolls are often correct – though it’s generally not PHP’s fault per se. The blame lies solely with the developers using it. As a PHP adept, this should comfort you because it’s something that can be fixed on your end. You can also derive consolation from the fact that other web languages and frameworks suffer from the same problems, it’s just not generally advertised. The bad news is that PHP application failures are huge and numerous, because the language is both popular and powerful enough to enable truly epic bugs.

The Basics: How PHP Works

Because PHP is so accessible and ubiquitous, there are a lot of people copying and pasting scripts together – people who in a more perfect world would be forbidden by law from ever touching source code. Even when real developers are doing it, hacking an easy language does not absolve anybody from the responsibility of knowing what goes on inside a system behind the scenes.

At the most fundamental level, the webserver hands a request from a browser over to the PHP runtime. This sounds like a really simple concept and for the most part, it is. Nowadays, most serious web servers are configured to shove requests directly into the gaping maw of a long-running PHP process dispatcher. After PHP is done with its thing, it passes the result data back to the webserver, which in turn hands it over to the client browser. Historically, this was not always the case: in the past, web servers often started up a complete PHP instance just for one request. If this sounds inefficient to you, you are absolutely right. After people realized how wasteful this method was, they adopted the current model of re-using PHP instances after they completed their jobs. However, some mass-market ISP hosting plans still use the older model, all the more reason to keep an eye on your code performance at all times.

A clean slate
Every developer should internalize this: fundamentally, PHP is a per-request environment. Whatever you did during the last request, the next one will start with a completely blank slate. This stateless paradigm is not very common as web languages go. Many others operate as a persistent environment. PHP’s way of doing things like this is both awesome and problematic, depending on your use case. On the plus side, this allows developers to look at each request as an isolated problem. Also, it’s much more difficult to make a mistake that takes down the entire server. There are less memory leaks and other weird effects that come from having a stateful runtime. But on the negative side it also means developers must understand how rebuilding the entire environment for every request comes at a computational cost. Many PHP apps are slow because developers did not consider this cost. It’s our responsibility to keep that startup cost low, so doing as little initialization as possible upfront is always a sensible concept.

Things to avoid:
Anything that smells of gratuitous initialization procedures. Don’t load, check, connect, or compute anything that isn’t needed. PHP is not designed to fulfill your dream of becoming an OOP purist. Avoid huge class hierarchies: keep in mind that all this structure has to be parsed and then instantiated at every request. It’s often better to have a very flat class system. Don’t store big amounts of data in the $_SESSION variable because it, too, has to be reloaded on every request. Unless you’re sure your web server does opcode caching (with APC for example), don’t use huge files full of unnecessary source code.

Things to Do:
Procedural programming is not necessarily evil. Do it whenever it has speed and/or simplicity advantages. Only require()/include() files that are definitely needed. Consider using a class loader (carefully) to load functionality modules on demand instead of monolithically including all your stuff upfront.

Things to know:
- HTTP request headers: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields
- CGI interface, variables: http://en.wikipedia.org/wiki/Common_Gateway_Interface

Profiling, profiling, profiling…

PHP makes it easy for you to track the amount of processing time and memory your app is using. It is absolutely essential to track this. Intuitively, I’d say that the execution speed of PHP falls somewhere between Ruby and JavaScript(V8) and it’s easy to make mistakes that end up using a lot of memory and/or valuable CPU time.

You don’t even need serverside debuggers or fancy instrumentation to achieve this. The function microtime() returns a timestamp in microseconds, and memory_get_usage() gives you a basic idea about your script’s memory behavior. This makes it easy for you to check the two most fundamental resources at key points during your application’s execution path.

Personally, I like to use an extremely simple profiling function based on the microtime() function. Using a function like this to profile your code will allow you to measure how horrible certain operations really are. For example, connecting to a database. Running regular expressions. It all has a cost and you need to know what it is. It’s always good to know what’s going on behind the scenes – so avoid libraries that obfuscate their behavior for the sake of fake simplicity.

For more serious profiling and debugging, you should check out: http://xdebug.org/ but microtime() still provides you with valuable and quick information in any PHP environment.

Sane(r) Input and Output

One of the most common publicly visible mistakes is failure to sanitize the input of a web application. It’s important to remember that every single piece of data coming into your app is potentially hazardous. The web really is out to get you! In PHP, the $_REQUEST array contains all the parameters pertaining to the current request and it’s essential not to trust them. Sadly, there is no single way to make this data safe. It depends on what you do with it. On a more positive note, the handling of user input generally falls into one or more of the following categories and there are standard practices you can employ to avoid the worst:

>>> Rule 1 of input data hygiene: Nuke it from orbit, it’s the only way to make sure! <<<

Displaying Data
In the most common scenario, a user submits some kind of text to your application and the app in turn displays that text on the site. Naively displaying whatever the user put in opens a huge opportunity for attacks on your site with an XSS exploit Thankfully, it’s easy to sanitize this kind of data. In most cases, htmlspecialchars() will take a text and render it harmless by escaping angle brackets and other problematic characters. But in some cases you might want to allow the user to enter markup instead of just plain text. In theory, PHP lets you specify a set of “good” tags and filter all the other ones out with strip_tags() but this function is horribly unsafe because it allows malicious users to sneak JavaScript event attributes into the allowed tags. That means you have to use something to strip those attributes out as well (there are some examples in the PHP documentation), however this is not trivial. In fact, I believe it’s the single biggest reason why bulletin boards started their own markup language known as BBCode.

Text in Databases
SQL database queries come with a little bit of baggage. Often, you will need to take user input and run queries with it. For example, you might want to store data in the DB or you might want to retrieve it. Most database abstraction libraries will let you use a syntax like this:

SELECT * FROM articles WHERE id = ?

The nice part of having support for placeholders like “?” is that you don’t need to worry about making the content of your variable safe. You can just pass it as another parameter to your query function. Depending on the database and the library used, this might also have the further advantage of enabling the database to precompile the query and execute subsequent queries a bit faster.

There are, however, situations where you might not want to use or be prevented from using an abstraction layer or library. Using the built-in MySQL functions can work, too, you just need to be more careful. You do have to take care of properly escaping the variables yourself. There is just one single function that you can use to make data safe for MySQL consumption and its name is mysql_real_escape_string() If you’re using any other function, stop it immediately. mysql_real_escape_string() is your one and only true lord and savior: worship the blessed bytes it churns out.

Dynamic Code Paths

This is a touchy subject. PHP allows you to do a lot more things with your code based on variables than most other languages. If you’re going to use those features, make sure to have a very good reason for it. If done right, those features can substantially reduce the complexity of your code – but you have to be extremely careful.

Dynamic include()s: PHP allows you to include a file specified in a variable. You can do stuff like this: include('inc/'.$moduleName.'.php');. Contrary to many other people, I think it’s fine to use this feature in principle because it allows you to introduce very simple extension mechanisms into your app and it can help keep your codebase clean. But as always, with this kind of power, comes a huge responsibility: you have to make sure $moduleName is legit and can’t be used to call arbitrary code on your server. A good way to ensure basic sanity inside this variable is to use at least basename($moduleName) on it, but a much better solution would be to strip out any non-alphabetic characters. Nukes. Orbits. See above!

Dynamic variables and functions: In PHP, you can set the content of a variable $v by specifying its name through another variable. For example, if you set $nnn = 'vvv';, you can do $$nnn to access $vvv. But wait, there is more. Suppose you have a function vvv();, you can call this function by writing it as $$nnn();. Obviously, this is very powerful stuff, so you have to make sure the “id” variable (in this example $nnn) is sane and can’t be used from the outside to call arbitrary stuff in your app. Contrary to the previous methods, there is no single way to make sure this code can’t be abused: you’ll have to make sure of it in a manner that is appropriate to your code specifically.

Eval is evil: for some unknowable reason, many newbie developers seem to be enthralled by eval(), probably because they failed their saving throw against its evil whisperings of doom at some point. In any case, eval() is probably the one function responsible for the most colorful WTFs in coding history.

eval(); allows you to execute arbitrary code by invoking an interpreter inside the interpreter. Developers often use this to implement dynamic features, such as event handlers that can be specified by the users of an application. The dangers here are obvious: there is no way to make this safe. If you allow your users to specify custom code, you damn well better make sure they’re trustworthy, because they will have access to anything and everything on your server. I believe in 99% of all cases, the desire to use eval(); is not even remotely legitimate. However, there might be scenarios where eval() is justified, for example in CMS applications or meta-programming projects. For PHP beginners and intermediates there is just one rule regarding eval(): if you’re using it, you’re doing it wrong. It’s that simple. Don’t listen to the voices!

Calling the shell
Sometimes, when a specific and arcane piece of data processing is required, it can be a good idea to just call a Unix text command from inside your PHP application. If you do that, you have to be aware that your code has a high likelihood being specific to your server configuration. Chances are, your call won’t work on another configuration. Whenever some tiny aspect of your shell call is influenced by user data, you have to use escapeshellarg() and escapeshellcmd() to sanitize those values.

Regular expressions
Regular expressions are very practical, small and can be highly performant if done right. They do, however, require a specialty tech priest to come in and bless the code. You can’t just copy an arcane regex ritual from the web and expect it to work on your project. Regexes often sit there and look like they’re working, but in reality they’re just lying dormant until they finally betray you at a more opportune time. It’s ridiculously easy to get them wrong, and they don’t lend themselves to bug-spotting at one glance. You probably should not rely on regular expressions to sanitize your data, unless you know exactly what your expression does. Because even if you think you know what it does, chances are high it will do weird things with weird inputs. Even the Supreme Grandmaster Regex Scholars of Perldom get this stuff wrong with a scary high probability. Personally, I’d advise you to steer clear of security-relevant regexes if the same can be done with a few simple and more maintainable lines of real code instead. Orbits. Nukes. Swords. Gunfights.

What to know:
- Regular Expressions: http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

Databases

We already talked about database security. As always, there is more. In most cases your database will be MySQL. Like PHP, MySQL has a very bad reputation – so it’s only natural that those two should pair up to be the web server standard configuration of the civilized world. MySQL will be fine for most of your standard storage and query needs. A lot of people tend to prematurely optimize their web projects and choose a NoSQL solution because they believe it’s going to be faster, or even simply because “the cool kids are doing it”. Needless to say, this is not actually warranted in most cases. PHP allows you to go with the database you like the most – if that’s MySQL, you’ll save a lot of configuration trouble. If it’s not MySQL, that’s fine, too. But chances are very good you don’t need a special DB solution for your web project so unless you’re experimenting you’ll probably get the most mileage out of something you already know very well.

What to avoid:
Avoid making a database connection when your request doesn’t need it. And when your request does need it, you should prefer persistent database connections over ad-hoc ones (e.g., use mysql_pconnect()) for the simple reason that it potentially shaves some execution time off the connection part. Avoid executing many queries in favor of consolidating them into just a few. Every time you fire off a query to the DB server, your program has to wait for data to come back. It’s also a good idea to avoid hugely complex queries, especially if you need to plan for scalability. SQL gives you a lot of rope to potentially hang yourself, keep performance in mind and measure it!

A word about legacy libraries
You may be wondering why I’m referencing an obsolete MySQL API above. Even though the very first example is all about DB abstraction libraries, people still bring it up. To make it clear: I’m not condoning its usage, but chances are you’ll come across similar problems at some point, for example if you’re debugging legacy code or if you work with other libraries where proper escaping becomes an issue. It’s worth to at least be aware of unsanitized input at all times. I’m not advocating the use of failure-prone, low-level, obsolete libraries – I’m trying to talk about being more conscious of dirty data. Again: use PDO, or whatever suits you, avoid naked mysql_ calls.

What to know:
- MySQL, obviously: http://dev.mysql.com/doc/refman/5.0/en/tutorial.html
- Indexes are essential

File Uploads

File uploading is the act of accepting a file from the browser into your web application. Like all user input, you must be prepared for the worst. People will try to crash your uploading code or they will try to upload executable files onto your server. When accepting an uploaded file, it’s important to check its content first before storing it on the server. For example, if an attacker manages to upload a PHP code file into one of your directories, it’s game over. That’s what happened to the Trojans. One day they allowed a highly executable horse to upload into their root directory. It was not a good day.

To avoid this, you have to make sure that, say, an image file uploaded to your app is actually an image file. The $_FILES variable contains information about a file’s MIME type. Sadly, you have to disregard this information, because it’s been supplied by the user’s browser and is thus utterly evil. Instead, you have to get the actual MIME type of the file directly. In the good old times, you could use mime_content_type() however this is now a deprecated function. Rescue comes from an unexpected place: the GD library has a function that, among other things, returns the actual MIME types of image files: getimagesize() Use this to check what your uploaded file actually contains and simply reject everything that does not correspond to one of the MIME types explicitly allowed by you.

What to know:
- MIME types: http://en.wikipedia.org/wiki/MIME
- HTTP methods: http://en.wikipedia.org/wiki/HTTP#Request_methods

Extra Credit: Caching

On many servers, you will have access to a service called memcached. It’s essentially a mini server process that allows you to store and retrieve arbitrary data very fast. To save or retrieve a data package from memcache, your application needs to connect to the memcached server and give it the key of the object you’re interested in. This key can be any string. Keep in mind though, that other applications are using the same key/value storage so choose your keys in a manner that does not lead to conflicts. Remember to store user-specific information with a user-specific key.

Like with any server, connecting to memcache and retrieving a cached object costs time. It’s only worth it if that interval is actually shorter than the time it takes to recompute a given data structure.

In the absence of a fancy memcached service, you can simply fake it, too. For example, if your application needs to generate a report or a big chunk of HTML that seldom changes, you can store that data in the file system as well. Simply give your application a tmp/ folder and dump your cached objects there. But while memcached objects expire automatically, you’ll have to take care of the lifecycle of cache files yourself.

More Performance Tuning

Keeping your response times low inside the app is just the beginning. There is much more you can do to speed up page loading. Going into all the details here would be drastically out of scope. There are several books dedicated to these problems. Just a few performance hints here that are not generally advertised:

Consolidating static files
PHP makes it ridiculously easy to stream arbitrary content to your users, including JavaScript and CSS files. The nice aspect of serving JavaScript and CSS from PHP is twofold: first, you have full control over the HTTP headers needed to tweak the browser’s caching behavior – this is an option you don’t have when serving static files from your webserver directly in most cases. Second, you can actually concatenate multiple JS or CSS files into one big chunk and give this chunk out in one go.

For example, say your web app needs JQuery, 4 JQuery plugins, 2 custom JS code files and 4 CSS files. A browser must make 11 requests to get those files. While the files themselves may be small, the latencies of requesting them can add up to a considerable delay. What’s worse, you’ll have at least 11 include tags inside your HTML for every request! You could now combine those files manually inside a text editor, but that makes them harder to maintain. It’s easier to just have PHP compile two large files: one JS and CSS!

Keep an eye on your output
From time to time, have a look at the HTML code your app is producing. Sometimes it’s amazing how much junk can creep in there. Have a look at every element and ask yourself if this really needs to be in there. The same goes for CSS and JavaScript files: over time they tend to accumulate dead end sections that not used anymore. The Unix command "grep" is your friend. With grep, you can search your entire app for all occurrences of a string and it can really help you find out whether a given piece of code is actually used.

You Do Have Comments

That’s it from my end. There is lots more to be said on all of these subjects, but it should be enough to point interested people in the right direction. Chances are, battle-hardened PHP veterans disagree with any or all of these suggestions. Some may even get violently ill upon reading this. Pay attention to them, they might have a point. Or not. Decide what makes sense to you.

Happy coding!

[email protected]

Things I Can’t Do Anymore On Mountain Lion

I’m pretty annoyed at OS X lately, so here are some

Things I Can’t Do Anymore On OS X Mountain Lion (That Were Possible On Lion)

Can’t use as many screens as I’d like

The black surface you see there is one of my screens. Mountain Lion is reasonably certain the display exists, it just won’t actually show anything on it anymore. It’s just black, except for the mouse pointer. That still gets displayed, but no desktop, no windows. Even more annoying, you can have windows on that screen but they won’t be shown. So I you’re wondering where that Finder window is you just opened, it’s probably under the black shroud.

Mountain Lion doesn’t remember screen arrangements

Every second reboot or so, ML forgets the placement of its displays. It will then either assume they’re all in a single line in some arbitrary order, or it will methodically switch all the screens left to the center with those right of the center. I expect this shit from Linux, but apparently only Microsoft can do multi screens correctly now.

Can’t have a quiet evening

I’ve been wondering why my Mac got so loud after upgrading to ML. iStat Pro shows why: the HD/Expansion slot fans are running high. I suspect this is due to the increased workload of the graphics cards (who aren’t doing anything dramatic besides displaying static content right now), but then again it might just be a random SMC bug. I don’t think they’ll ever fix this though, because it happens on the Mac Pro.

Can’t connect to remotely mounted DMGs

I got most of my data on an encrypted virtual disk. Prior to ML, I could connect to it from other Macs just as if it were a normal drive. If it was in /Volumes you could connect to it remotely. Now with ML you can only connect to physical remote drives. For no fucking reason at all.

Can’t start applications quickly

What you see there is the area of the screen where I’m waiting for Text Edit to load. On an 8-core Mac Pro with 16GB RAM and a fast SSD system drive. Yes, Text Edit. To be fair, non-Apple apps generally load faster, but this is a telling symptom I think. My guess: it has to do with the revolting iCloud “integration” all Apple apps have now.

If you own (reasonably recent) Apple Macs, upgrading to OS X 10.9 Mountain Lion does not feel like an optional step. If you’re like me, you’re always excited about new features and running the latest version of everything just gives you a warm fuzzy feeling all over. A heard a lot of people actually see OS X as a necessary evil of owning a Mac (which is completely false, as you can run Windows or Linux on them just fine), but to me the OS is the main reason why I’m a Mac user. But looking at Mountain Lion, it seems increasingly unlikely that I’ll remain a Mac user for the next decade or so.

Starting with Snow Leopard, OS X is apparently on a mission to disempower the user. Obvious bugs don’t get fixed, like the mysterious inability of Spotlight to find relevant files  -sometimes it can’t even find Applications that sit right there in the /Applications/ folder. The full screen feature remains stubbornly useless on systems with multiple displays. If an app didn’t come from the App Store, Mountain Lion won’t let you run it. Actually it grudgingly lets you run non-Apple apps if you turn on some obscure setting in the System Preferences app, but who except me does this?

Then there are these things I can’t do on Mountain Lion that were possible before. I’m sure other users would have other things to add to that list.

Let’s Stop the Unix Time Insanity

The good old Unix timestamp is something I’ve been using as a developer forever. It’s a really neat concept when you get down to it: a simple integer number counting up the number of seconds passed since January 1st 1970. It works brilliantly, especially if your code can operate on the assumption that this number is based on UTC.

If you’re like me, you might have naively considered the following implementation was at work behind the scenes: first we have an integer I counting the amount of seconds that went since a fixed date in a fixed, then we use some tables and rules R() to generate from format Z a human-readable date D suitable for a timezone T. So by calling R() we would get a sane representation of a date like this:

R(I, T, Z) = D

Those four final words cause all the insanity and headaches in relation to handling time on computers, including just recently the outages of some very high profile web sites. I know there is a lot of smugness going on in developer circles where people get high on posting comments such as “of course it is like this. it’s the way we’ve done it forever, it’s the only way.” and there will even be a lot of links to pretentious articles telling you 1001 things you supposedly didn’t know about time. But this isn’t about all that. Let’s put all the inevitable straw man posturing aside and look at the core problem: Unix timestamps are thoroughly and unnecessarily broken. They should be a continuum.

Everybody already assumes they are continuous, sometimes even Kernel developers do. This is a reasonable assumption. Let’s make it reality. Leap seconds and other timing stuff belongs firmly in the human-readable layer, the decision to include logic for this into the timestamp code itself was wrong. Countless developer hours have been wasted on it, for no good reason. Let’s throw all of that out.

I++;

Want me to remember your site? Fat chance.

Pieces about the decline of RSS are an old hat by now, and they are probably correct in assuming that Joe Normalsurfer doesn’t care about subscriptions, he cares about social spam news instead and gets all his info from Facebook. However, this is not how it works for me – and I suspect I’m not the only one. I use my social networks for connecting with people, as opposed to sucking down marketing or news by the megabyte.

I your site doesn’t have an RSS feed, I’ll have forgotten about it by this time tomorrow. But this is, coincidentally, how site owners seem to pick their goals: have your five minutes of badly segmented random web traffic now and then be forgotten until you make the viral news once again next year. That’s a bad path to be on. You should be forging long-lasting relationships with your users, or at least with your power user nerd audience. A 100-word blurp on TC is not as effective as you might think. Spare yourself the stress and frustration of constantly spewing out me-too press releases and “news” in the vain hope of landing a viral hit. Build a relationship instead, bit by bit.

The only way to do that is to offer an RSS feed. No feed, no relationship.

Success or Failure: Analyzing 599 Kickstarter Projects

As news of ever increasing pledge amounts of Kickstarter projects are coming in daily, I find myself wondering. It sometimes becomes difficult to separate anecdotes from data. People (including me) tell other people to “just get some crowd funding”, but how likely is it really to raise any money there? I’m worried that for average project founders the chances of getting funding are pretty slim, while the “big” well-connected guys seem to make ridiculous amounts of money (and not always for a readily-apparent reason, either).

Scraping 599 Project Summaries
The Kickstarter site itself does a good job of hiding the severe limitations it places on browsing through projects. It’s not obvious at first glance how amazingly difficult it is to simply browse a list of projects which has not been filtered and pre-selected to show only those items most likely to succeed. But it’s true. Likewise, it’s not easy to get a list of completed projects, or even some data about failed ones. A lot of projects are on that site – but how many? Difficult to answer. At every corner, the UI does its best to show us only what we are supposed to see.

To get a better feel for the success chances of projects, I decided to pull project data from the “Ending Soon” list. I chose this list because – as far as I can tell – there is little if any filtering going on, it’s the only meaningful list on that site that gives you just the projects, sorted by ending date. I got 599 of them, that seems to be the limit of the endlessly scrolling page. This is the best option I could find short of writing a spider to scrape the site.

The Data
Here is the raw data from this evening. It’s a list of projects, showing how far along they are both in time and funding. The Kickstarter site makes it difficult to know how long a project has been going on, but at least we know when its deadline is. Using an average project duration of 20 days, it’s possible to guess whether a project is going to fail or not. Of course, the closer a project comes to its deadline the more accurate that assessment becomes.

At first glance this shows (on a log10 scale) the projected pledge amounts versus project minimum funding goals of all projects that have 6 or less days to go until deadline. Everything south of the blue line is going to fail. As you can see, failed projects often miss their minimum funding threshold by about one order of magnitude; it’s uncommon to fail by only a small margin. Successful projects, here shown as above the blue line, seem to be more heterogenous but apparently often manage to get more than double their threshold funded.

Of the 599 projects sampled, 335 are projected to succeed while 264 will most likely fail. This means you can expect a 56% chance of getting your funding when applying on Kickstarter.

Let’s have a look at success chances as segmented by project magnitude (aka. the funding goal):

< $1000 < $10000 < $100000 < $1000000
Successes 63 211 57 4
Overfund. Ø 173% 160% 152% 125%
Failures 19 179 64 2
Underfund. Ø 45% 39% 41% 26%
Success Ratio 77% 54% 47% 67%

Now, while the results in the top magnitude bracket are not really statistically significant, it’s apparent that most successful projects are overfunded by at least 50%, even if they are relatively large! Failing projects, on the other hand, can expect to be underfunded by about the same percentage.

Among the failed projects, there was an average 0.42 orders of magnitude shortage between the funding goal and the actual pledge amount (standard deviation 0.29). Successful projects overshot their funding goal by an average 0.17 orders of magnitude (standard deviation 0.21).

So what does it mean?
After a lot of hand waving, you can assume you’ll have a 50% chance of getting funded on Kickstarter. While bigger projects are less likely to succeed, the drop-off in likelihood is NOT linear with project size – so it’s probably a good idea to aim high. If your project is bumbling along at 50% or less projected funding: don’t expect any last-minute miracles, because the data does not show an increase in funding activity towards the end of the deadline. It looks like in most cases, you should be able to tell after a few days whether a project is going to fail or succeed.

I believe there is more interesting data buried way deeper inside Kickstarter. For example, the question whether an “average Joe” type of person has a realistic chance of getting funded is still open. I suspect you’ll probably have an extremely slim chance if you don’t have some social media heavyweights ready to back you and gently steer the crowd towards your project. If this hypothesis is true, it would mean Kickstarter is something of a bubble movement (as in: a mass movement driven by dubious value assessments due to the influence of parties who are experiencing a conflict of interest). It would be interesting to look at Twitter data in this respect – maybe another day.

Update: Dan Misener seems to agree.

Project Glass: A Concept Car Divorced From Reality

Augmented Reality is very cool and I believe it’s the future of displays, period. As a futurist I’m even trying to make peace with the repeating bit of history where a company comes in and implements a sci-fi idea only to then market the thing as though they invented the concept themselves. They did not. Nothing Google is showing you is original. But that’s not the problem.

The problem is technological. First, the glasses themselves: I’d be very surprised if they even exist yet. Right now, there are basically three ways to make these glasses happen. These are from worst to better:

1) Projecting the image onto a surface near your eye. The resulting picture will always have focus and depth-of-field problems. These kinds of displays get on your nerves very fast. Also, you’ll have to live with some sort of clunky display between your eye and your environment. They can make the surface smaller so as to be more unobtrusive, but then the screen area shrinks with it as well.

2) Projecting the image from the LCD directly into your eye. This still has a lot of quality problems, but it should feel more natural than the surface projection HUD. However, the trade-off is that you can’t make an area of the user’s viewport dark. You can only add light to the retina, not subtract it.

3) Using a laser to project the image dot-by-dot onto your retina. Of all the methods, this will probably lead to the most natural image. This method does share the drawback of 2) – also, for all intents and purposes this technology simply does not work in the field yet.

Judging from Google’s video, which is light on technical details, they’re probably using method 1). Be prepared then: these glasses won’t at all feel like the overlay scenes from the movie.

The next big issue is the input channel. I see they opted for voice input, just in case people are not already fed up with disruptive Bluetooth douche bags and other assholes prancing around talking to Siri. This will be horrible and it will lead to more technology being banned from more public places. The right thing to do here would be to employ a real BCI, for example EEG-style electrodes. Barring that, a muscle sensor would be acceptable, too. Anything but the dreaded voice interface will do, actually.

As cool as the idea is, I’m skeptical about the approach and the way it’s being marketed.

CAPITAL C – how the crowd liberates itself

Looking forward to it:

Asking the crowd for support to make your vision become a reality is no longer the nerdy pipe dream of a bunch of Internet geeks. Far from it!

According to MIT Professor Eric von Hippel, crowd sourcing as well as crowd funding are growing into no less than “the biggest paradigm shift in innovation since the Industrial Revolution.“

And it happens right now: Every day the number of innovators opting to leverage the power of the crowd (instead of relying on conventional forms of financing) is growing rapidly.

CAPITAL C – how the crowd liberates itself is the very first feature-length documentary about crowd funding, how it is changing the world of today and how it will shape our future. It’s about a new breed of independent spirit created by us, the crowd.

So, whether you are a project developer, a supporter of crowd funded projects or just curious about the whole idea: This one goes out to you!

A Bullshit Documentary About the Fake Dangers of Cell Phone Radiation

There is a documentary called “Full Signal” that was being broadcast over the holidays portraying the perceived dangers of radiation given off by wireless data devices. It’s really surprising how such an obvious piece of bullshit is given a broad audience like that. I can’t help but notice how these scaremongering garbage reports all follow a distinct narrative pattern: First let’s invoke a stream of baseless but scary possibilities. Then argue that, while you’re not asserting that any of these bullshit stories are actually true, the danger of dismissing them is not a risk worth taking. As you would expect, this point is driven home by activist moms who love to appear on TV in order to make random “won’t someone think of the children!!1!” arguments.

Many bullshit documentaries try to immunize themselves from critique by using an array of useful idiot “experts” (such as medical doctors, who more often than not receive abysmal science training to begin with) and moral meat shields like children, nice old ladies, or sick people. It’s been a while though since I’ve seen a piece of shit documentary like “Full Signal” pull this many ridiculous stunts, including implying that cell towers are antisemitic. I kid you not. I was having a hard time deciding whether I was watching a parody piece.

Radiation Is an Evocative Word

Fact is the kind of electromagnetic radiation used for cellular data links does get deposited into living tissue. As these waves of energy pass through cells, they sometimes get absorbed by molecules – most often they bump into water molecules. Every time the interact like this, they give the molecule a push and cause it to zoom around faster. This is what we commonly call heat energy. Heating stuff up is the basis of our civilization. We use it to cook food, for example. An indirect way to heat food is the cooking oven which first produces heat and then transfers it into the food through the cooking gear. A direct way to heat stuff is to put it into a microwave oven. Here, the same kind of electromagnetic radiation heats up the water molecules directly.

Bullshit artists like the ones behind “Full Signal” use exactly this image to scare people with absurd brain farts like, and I quote, “you don’t want to bake your brain like you would a potato in a microwave oven“.

Compared to pretty much any other heat source imaginable, cell phones are incredibly weak. On a bright day, the sun puts about 1000 Watts down on 1 square meter of ground. Most microwave ovens produce about 800 Watts in total. A cell phone puts out at most 1 Watt. And most of that is not even absorbed by any tissue at all. That’s the reason why people don’t usually burst into flames when they use an iPhone. It’s also worth mentioning that the heat produced by a cell phone’s electronics and battery by far outshines the antenna. Electronics feel warm to the touch because they produce heat, not because their antennae are secretly cooking your hand from the inside.

Manipulating the Audience and Preaching to the Choir

Intellectually bankrupt crapumentaries like “Full Signal” make it a point to criticize statements made by (I quote) “physicists and engineer types” and love to point out that nobody listens to doctors and the concerned public in general. As I said before, physicians are often no real scientists but nevertheless are perceived as competent and trustworthy on pretty much any subject matter. It’s even worse with the general public.

People get airtime and credibility simply because they’re parents, or because they believe something very strongly. No person should get blanket credibility for anything. Of course arguments from authority are always very popular and very few viewers have the time or the energy to dive into the science themselves. In the same territory, parading an array of highly questionable studies is a similar if more sophisticated maneuver. Both tactics are used extensively in this documentary.

“Physicists and engineer types” can provide theoretical and empirical insights. Concerned mothers and professors desperate for grant money can not, but they do look incredibly sincere on TV.

Playing Wack-A-Mole with Causality and Logic

As an example, I’d like to grab an argument that was used by one of these so-called experts from the documentary to illustrate the point: a person appears on screen, talking about the dangers of smoking cigarettes as an analogy how EM radiation supposedly harms people. Straight off the bat, the unspoken premise is that EM radiation is in fact equivalent to cigarette smoke. Next, the very core concepts of statistics and biology are tweaked to make an assertion about cigarette smoke that sounds plausible for about one second until the viewer’s brain kicks in: some people get cancer from 5 cigarettes per week, where other people don’t ever get cancer from smoking 3 packs a day – the “expert” asserts that this is due to individual sensitivity. In reality, this dynamic is driven by the number of cigarettes more than any other factor.

They lie about this in order to nudge the viewer into believing that doses don’t matter and to lay the groundwork for an argument about this mysteriously inscrutable individual sensitivity to EM radiation professed by some utter lunatics throughout the show.

Insanity for Fun and Profit

Not a single intelligent life form was being interviewed in this movie. The people who did make it on screen delivered a treasure trove of idiocies, biases, fallacies, and in some cases straight-out psychoses. Far too many idiotic moments happened to paraphrase them here, but come to think of it I would like to encourage people to show this film during psychiatry training. If you think this is an excessive claim, just watch how people suffering from “electro hypersensitivity” conduct their lives – many homeless schizophrenics are actually saner than that. At least vagrant psychotics don’t write books about it in a pathetic attempt evangelize their lifestyle, trying to transform their insurmountable psychosomatic suffering into a life philosophy. Attention-seeking assholes, every single one of them.

Homeopathy, traditional Chinese medicine, dubious nutrition philosophies, dousing, EM phobia… it’s all the same thing and, I suspect, in many cases being practiced by the same people.