Udo’s Twitch Streaming Guide for Mac

I don’t know how many Ludum Darers are Mac users, but it’s sure not common to see them doing Twitch feeds. Part of the problem might be that it takes a lot of effort and fiddling to set it up properly. So for the benefit of everyone who’s interested in doing it, here’s what I’ve learned.

You’re going to need a few pieces of software, but the good news they’re all free or low-cost.

Screen Capture: CamTwist (free)

You can get it at camtwiststudio.com. It does a pretty decent job of capturing video content off the screen in real time, and it also can do some basic compositing. It’s not the most stable software, however, so I recommend running it at a low frame rate. Open CamTwist and double-click on Desktop, full screen, and if necessary select the screen you want to capture.

camtwist

Audio Bus / Routing: Soundflower (free)

Download Soundflower from http://code.google.com/p/soundflower/ and install it. It comes with SoundflowerBed, a simple config application. Start it and route the 2ch bus to the speaker.

soundflowerbed

Open System Preferences and select the 2CH bus as your default audio output. This causes all apps to send their audio to Soundflower 2CH. Since you already routed 2CH to your speakers earlier, this should enable you to hear application sounds normally – they’re just routed through the Soundflower bus now. If you can’t hear anything, fiddle with the volume – including the input volume of the 2CH bus (which you can also find on the same settings window).

audio-settings

That’s the most basic audio setup imaginable, we’ll get to the good stuff later. All this enables you to do right now is stream your screen and all your audio output. We just need an app to take this data and send it to Twitch:

Streaming: Flash Media Live Encoder (free)

You can download the Media Live Encoder from Adobe. It’s a hassle though, so you might want to make sure to save your install image somewhere safe for later in case you need it again. Then open your Twitch account in a browser and download the streaming profile which they conveniently provide as a download. It’s an XML file, store it somewhere convenient. Start the Live Encoder, go to the File menu and load this XML file. It sets up the basics for streaming including your private streaming key, but you can still fiddle with the settings.

Select CamTwist as the video device, and Soundflower 2CH as the audio device.

live-enc

If everything is set up correctly, you’re going to see a preview of your video stream showing your desktop, and the audio bars to the left of the preview should fluctuate when you play some music. Hit the “Start” button and see if you can stream to Twitch alright.

Congrats, you have completed a basic setup! But there’s more. You might want to also talk on your videos, and you might want to mix output levels of different apps a little. Sadly and incomprehensibly, OS X has no way of doing this on its own, so we need to download more software:

Microphone Routing: LineIn (free)

Download LineIn from https://www.rogueamoeba.com/freebies/ and install it. It allows you to route input from your microphone into the Soundflower bus. However, LineIn has a stunning limitation: it can only do one route at a time. This is bad because you don’t want to route your microphone input to your speakers (it’ll cause a nasty feedback screech), you only want to route it to the actual live stream. Luckily, you can trick LineIn to do this by duplicating the app. That’s it, just use “Duplicate” from the context menu and you’ll have 2 LineIn programs. Launch them both.

With the first instance, route your microphone to Soundflower 16CH. With the second, route Soundflower 2CH to 16CH:

dualflower

Activate Pass Thru on both, you should now see the indicator bars lighting up when you make a sound in the room.

Go back to Adobe Media Live Encoder and choose 16CH as your audio device.

With this maneuver we have tricked both Soundflower and LineIn into doing something their developers didn’t have in mind, but it works beautifully: the 16CH bus is now strictly for our streaming, but the 2CH bus contains all non-microphone audio which we can directly listen to. To stop streaming the mic, just deactivate “Pass Thru” on the first LineIn instance until you need it again.

Mixing: SoundBunny ($10)

Per-application audio levels is again something OS X can’t do on its own, we’re going to need yet another app for that. Go download SoundBunny for Mac, it’s $9.99.

soundbunny

Now you get these nifty sliders that allow you to actually control sound levels individually.

And that concludes my Mac streaming guide. As you can see due to stunning limitations of the OS X audio system we need a lot of extra cruft to accomplish some basic mixing and routing, but these apps I showed here do an excellent job of it. The only downside is you need to have them all running, and it takes a few minutes from cold start until you’re all set up and ready to stream. Remember to save your settings, especially the Media Live Encoder ones, for the next time.

Webcam Window: CamTwist or VLC

One little extra thing I like to do is show a little webcam feed in the corner as well. You can do this with CamTwist, but it crashes my system when I try it. If that’s also a problem for you, use VLC. Click on “Open Capture Device” in the menu and you’ll be able to select your Mac’s camera as an input device. Use VLC’s settings to make the video window always-on-top and remove the window’s title bar. It should now be an unobtrusive little frame that you can place anywhere you want on your streaming desktop.

vlc-vid

 

Getting around the iOS 7 activation error: how to fix things and unbrick if you’re in activation hell

When using Apple’s “Find my iPhone/iPad/Mac” software, you’ll have to weigh the risks against each other – what’s more likely: the device getting stolen or Apple bricking it? I made that choice for my iPad and Macbook Pro. Since I’m lugging them around everywhere I figured if they get stolen I could at least remote-erase them.

The other side of the argument is of course that if your iCloud account ever becomes compromised, you could very well end up with a very expensive paper weight that even Apple can’t fix. Or, if a software malfunction occurs (I know, what are the chances) you’ll end up with a bricked device that may or may not be restorable.

So last night my iPad 4 (iOS 7) all of a sudden displayed an activation dialog, prompting me to identify myself using the iCloud account associated with the device. Despite repeatedly filling out the form I was getting nowhere, and from time to time it showed me the error message “Could not activate iPad. Your iPad could not be activated because the activation server is temporarily unavailable”. As far as I can tell this message is a lie, there is nothing wrong with Apple’s activation server and the condition is sure as hell not temporary.

Attempts at reactivating, re-installing, or resetting the thing with iTunes, while being suggested by Apple, also failed. A device in this state seemingly can’t be restored using the normal iTunes options. After all this failed, this is the point where you pray that you don’t have any un-backuped data on the iProduct. This is probably also what they would do at the Apple Store if you brought it in for service:

Turn the device all the way off by pressing both buttons until the screen goes dark. Plug the iPad/iPhone into your computer running iTunes. Start it by pressing the home button and keep it pressed until a message appears prompting you to connect with iTunes. At this point, iTunes should recognize the device and walk you through the re-install process. All data on the device will be wiped and you’ll have to re-install from backup, or if you don’t have any: start over with a blank install.

After this is completed, iTunes will ask you to finally activate the device (since it’s been blocked). Contrary to all the other activation screens you will have seen at this point, this one will actually work! At least it did for me. When this is done, it’s a simple matter of restoring all the apps and settings and then your iDevice is open for business again!

Obviously, this is all at your own risk, but it did work in my case at least. For me, this incident was a wake up call. I’m not sure I could restore my Macbook so easily if something similar happened. For myself, I decided this little bit of extra security is not worth the risk of being stranded without working gear in a potentially very inconvenient place at a very inopportune time. It’s also not worth the risk of losing data and, potentially, the whole device.

Lesson learned, “Find My iThing”: off.

Project LaunchWay: the Case for a Community-Based Incubator

I want to try something, here goes:

nReduce was a virtual startup incubator. It recently closed down. In many ways, nReduce was a pioneer. It allowed would-be startup founders to join a curated online community that promised to give them a support network and a social platform while they were developing their ideas into products and companies. More importantly, nReduce brought together like-minded people who were doing the same thing, facing the same challenges.

In this respect, we should look back on nReduce not as a failed idea, but as an experiment in a space that has only just begun opening up.

One central lesson is that online incubators (if that’s even the right word) are fundamentally different from physical startup incubators. Online incubators are social frameworks for founders helping each other out as they go through the same experience. They’re a motivational tool, a communication platform, and a way to get very early feedback from smart people. Online incubators are not – at present – virtual farms where VCs and angel investors develop a hand-selected batch of startup candidates with funding and influence.

I believe, a true community-based online incubator can shine in this space if it knows its true strengths and doesn’t aim to solve problems it’s badly positioned for.

LaunchWay is an attempt at building a Ludum Dare for startups and software projects. It is not YCombinator, or Tech Stars, nor does it aim to be. It’s a platform where people can incubate their ideas in a nurturing community. Mechanisms such as a karma score and weekly deadlines are designed to provide additional motivation and feedback.

The first class of winter 2013 will start on the 1st of November 2013. During the next three months, startups will be encouraged to take part in weekly progress reports and to blog about their experience. At the end of the class, we will hold an online Demo Week where all the projects are being showcased. Where we go from there is completely up to the founders.

Let’s build a community-powered online startup incubator!

Register now for the LaunchWay class of W13

Press Release Translation Table

What they say What they mean
big news everyone! time to say goodbye
I’m excited to tell you that we joined [big company] we have been acqui-hired and my bank account feels very good right about now
we’ll continue to work on [product] development will be stagnating and after the last of the core team has left, the product will be discontinued.
a lot of exciting features are planned we’re about to introduce some questionable features
it’s been a great ride our funding was about to run out
our passion still is bringing [goal] to you with [product] we’re tired of this whole thing and want to do something else
together with our friends at [big company] who are just like us [insert reason] it’s a slow-moving corporate monoculture that engages in a lot of backstabbing but thankfully I’ll be out of there as soon as my contract runs out
thanks to all our customers and partners we like you but we need to pay our bills
we’ll continue to offer [product] we will be closing / discontinuing very soon
we have only just begun / you have seen nothing yet you will never hear from us again
stay tuned …for our official closing-down announcement.

That said, I don’t think being acquired or acqui-hired is not necessarily a bad process. Startups need to make money and founders need to have the prospect of a decent future. That is, after all, one of the prevailing reasons why people do startups. The big companies that are doing the acquiring also have valid reasons, they often have more money than they know what to do with, and they need to demonstrate forward thinking to their stock holders. Innovation rarely comes out of big companies, but they are in equal parts threatened and enabled by it – so it makes a lot of sense to keep the marketplace small and devoid of powerful newcomers.

The process does raise a few questions about the long-term viability of the user as a commodity, though. By now, people know that new services are there only for a short while until they either go under or get acqui-killed by another company. I think in the coming years we’ll see an increased reluctance of users willing to invest time and work into filling a startup’s database.

PHPitfalls

After reading the “Securing PHP” article written James Cunningham I thought I might gather a few points about using PHP from the developer side. Keep in mind that I’m not a security expert. However, this article contains a few starting points on preventing exploits, making PHP apps perform better, and miscellaneous stuff that I consider to be best practices. Your mileage may (and probably will) vary, so as always: take everything with a grain of salt no matter if you read it here or elsewhere. This is not so much a check list of concrete actions, it’s more a collection of points worth keeping in mind as you code.

PHP? WTF possessed you?!?

The first thing you come across when starting out with PHP is probably the fact that it has an extremely bad reputation. You will hear lots of things, including “it doesn’t scale”, “it’s not a real language”, “it doesn’t have X so it sucks”, “it’s not safe”, or “it’s Blub and you’re too stupid to realize it’s Blub From the perspective of the language runtime itself, this is all a lot of crap. Still, the trolls are often correct – though it’s generally not PHP’s fault per se. The blame lies solely with the developers using it. As a PHP adept, this should comfort you because it’s something that can be fixed on your end. You can also derive consolation from the fact that other web languages and frameworks suffer from the same problems, it’s just not generally advertised. The bad news is that PHP application failures are huge and numerous, because the language is both popular and powerful enough to enable truly epic bugs.

The Basics: How PHP Works

Because PHP is so accessible and ubiquitous, there are a lot of people copying and pasting scripts together – people who in a more perfect world would be forbidden by law from ever touching source code. Even when real developers are doing it, hacking an easy language does not absolve anybody from the responsibility of knowing what goes on inside a system behind the scenes.

At the most fundamental level, the webserver hands a request from a browser over to the PHP runtime. This sounds like a really simple concept and for the most part, it is. Nowadays, most serious web servers are configured to shove requests directly into the gaping maw of a long-running PHP process dispatcher. After PHP is done with its thing, it passes the result data back to the webserver, which in turn hands it over to the client browser. Historically, this was not always the case: in the past, web servers often started up a complete PHP instance just for one request. If this sounds inefficient to you, you are absolutely right. After people realized how wasteful this method was, they adopted the current model of re-using PHP instances after they completed their jobs. However, some mass-market ISP hosting plans still use the older model, all the more reason to keep an eye on your code performance at all times.

A clean slate
Every developer should internalize this: fundamentally, PHP is a per-request environment. Whatever you did during the last request, the next one will start with a completely blank slate. This stateless paradigm is not very common as web languages go. Many others operate as a persistent environment. PHP’s way of doing things like this is both awesome and problematic, depending on your use case. On the plus side, this allows developers to look at each request as an isolated problem. Also, it’s much more difficult to make a mistake that takes down the entire server. There are less memory leaks and other weird effects that come from having a stateful runtime. But on the negative side it also means developers must understand how rebuilding the entire environment for every request comes at a computational cost. Many PHP apps are slow because developers did not consider this cost. It’s our responsibility to keep that startup cost low, so doing as little initialization as possible upfront is always a sensible concept.

Things to avoid:
Anything that smells of gratuitous initialization procedures. Don’t load, check, connect, or compute anything that isn’t needed. PHP is not designed to fulfill your dream of becoming an OOP purist. Avoid huge class hierarchies: keep in mind that all this structure has to be parsed and then instantiated at every request. It’s often better to have a very flat class system. Don’t store big amounts of data in the $_SESSION variable because it, too, has to be reloaded on every request. Unless you’re sure your web server does opcode caching (with APC for example), don’t use huge files full of unnecessary source code.

Things to Do:
Procedural programming is not necessarily evil. Do it whenever it has speed and/or simplicity advantages. Only require()/include() files that are definitely needed. Consider using a class loader (carefully) to load functionality modules on demand instead of monolithically including all your stuff upfront.

Things to know:
- HTTP request headers: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields
- CGI interface, variables: http://en.wikipedia.org/wiki/Common_Gateway_Interface

Profiling, profiling, profiling…

PHP makes it easy for you to track the amount of processing time and memory your app is using. It is absolutely essential to track this. Intuitively, I’d say that the execution speed of PHP falls somewhere between Ruby and JavaScript(V8) and it’s easy to make mistakes that end up using a lot of memory and/or valuable CPU time.

You don’t even need serverside debuggers or fancy instrumentation to achieve this. The function microtime() returns a timestamp in microseconds, and memory_get_usage() gives you a basic idea about your script’s memory behavior. This makes it easy for you to check the two most fundamental resources at key points during your application’s execution path.

Personally, I like to use an extremely simple profiling function based on the microtime() function. Using a function like this to profile your code will allow you to measure how horrible certain operations really are. For example, connecting to a database. Running regular expressions. It all has a cost and you need to know what it is. It’s always good to know what’s going on behind the scenes – so avoid libraries that obfuscate their behavior for the sake of fake simplicity.

For more serious profiling and debugging, you should check out: http://xdebug.org/ but microtime() still provides you with valuable and quick information in any PHP environment.

Sane(r) Input and Output

One of the most common publicly visible mistakes is failure to sanitize the input of a web application. It’s important to remember that every single piece of data coming into your app is potentially hazardous. The web really is out to get you! In PHP, the $_REQUEST array contains all the parameters pertaining to the current request and it’s essential not to trust them. Sadly, there is no single way to make this data safe. It depends on what you do with it. On a more positive note, the handling of user input generally falls into one or more of the following categories and there are standard practices you can employ to avoid the worst:

>>> Rule 1 of input data hygiene: Nuke it from orbit, it’s the only way to make sure! <<<

Displaying Data
In the most common scenario, a user submits some kind of text to your application and the app in turn displays that text on the site. Naively displaying whatever the user put in opens a huge opportunity for attacks on your site with an XSS exploit Thankfully, it’s easy to sanitize this kind of data. In most cases, htmlspecialchars() will take a text and render it harmless by escaping angle brackets and other problematic characters. But in some cases you might want to allow the user to enter markup instead of just plain text. In theory, PHP lets you specify a set of “good” tags and filter all the other ones out with strip_tags() but this function is horribly unsafe because it allows malicious users to sneak JavaScript event attributes into the allowed tags. That means you have to use something to strip those attributes out as well (there are some examples in the PHP documentation), however this is not trivial. In fact, I believe it’s the single biggest reason why bulletin boards started their own markup language known as BBCode.

Text in Databases
SQL database queries come with a little bit of baggage. Often, you will need to take user input and run queries with it. For example, you might want to store data in the DB or you might want to retrieve it. Most database abstraction libraries will let you use a syntax like this:

SELECT * FROM articles WHERE id = ?

The nice part of having support for placeholders like “?” is that you don’t need to worry about making the content of your variable safe. You can just pass it as another parameter to your query function. Depending on the database and the library used, this might also have the further advantage of enabling the database to precompile the query and execute subsequent queries a bit faster.

There are, however, situations where you might not want to use or be prevented from using an abstraction layer or library. Using the built-in MySQL functions can work, too, you just need to be more careful. You do have to take care of properly escaping the variables yourself. There is just one single function that you can use to make data safe for MySQL consumption and its name is mysql_real_escape_string() If you’re using any other function, stop it immediately. mysql_real_escape_string() is your one and only true lord and savior: worship the blessed bytes it churns out.

Dynamic Code Paths

This is a touchy subject. PHP allows you to do a lot more things with your code based on variables than most other languages. If you’re going to use those features, make sure to have a very good reason for it. If done right, those features can substantially reduce the complexity of your code – but you have to be extremely careful.

Dynamic include()s: PHP allows you to include a file specified in a variable. You can do stuff like this: include('inc/'.$moduleName.'.php');. Contrary to many other people, I think it’s fine to use this feature in principle because it allows you to introduce very simple extension mechanisms into your app and it can help keep your codebase clean. But as always, with this kind of power, comes a huge responsibility: you have to make sure $moduleName is legit and can’t be used to call arbitrary code on your server. A good way to ensure basic sanity inside this variable is to use at least basename($moduleName) on it, but a much better solution would be to strip out any non-alphabetic characters. Nukes. Orbits. See above!

Dynamic variables and functions: In PHP, you can set the content of a variable $v by specifying its name through another variable. For example, if you set $nnn = 'vvv';, you can do $$nnn to access $vvv. But wait, there is more. Suppose you have a function vvv();, you can call this function by writing it as $$nnn();. Obviously, this is very powerful stuff, so you have to make sure the “id” variable (in this example $nnn) is sane and can’t be used from the outside to call arbitrary stuff in your app. Contrary to the previous methods, there is no single way to make sure this code can’t be abused: you’ll have to make sure of it in a manner that is appropriate to your code specifically.

Eval is evil: for some unknowable reason, many newbie developers seem to be enthralled by eval(), probably because they failed their saving throw against its evil whisperings of doom at some point. In any case, eval() is probably the one function responsible for the most colorful WTFs in coding history.

eval(); allows you to execute arbitrary code by invoking an interpreter inside the interpreter. Developers often use this to implement dynamic features, such as event handlers that can be specified by the users of an application. The dangers here are obvious: there is no way to make this safe. If you allow your users to specify custom code, you damn well better make sure they’re trustworthy, because they will have access to anything and everything on your server. I believe in 99% of all cases, the desire to use eval(); is not even remotely legitimate. However, there might be scenarios where eval() is justified, for example in CMS applications or meta-programming projects. For PHP beginners and intermediates there is just one rule regarding eval(): if you’re using it, you’re doing it wrong. It’s that simple. Don’t listen to the voices!

Calling the shell
Sometimes, when a specific and arcane piece of data processing is required, it can be a good idea to just call a Unix text command from inside your PHP application. If you do that, you have to be aware that your code has a high likelihood being specific to your server configuration. Chances are, your call won’t work on another configuration. Whenever some tiny aspect of your shell call is influenced by user data, you have to use escapeshellarg() and escapeshellcmd() to sanitize those values.

Regular expressions
Regular expressions are very practical, small and can be highly performant if done right. They do, however, require a specialty tech priest to come in and bless the code. You can’t just copy an arcane regex ritual from the web and expect it to work on your project. Regexes often sit there and look like they’re working, but in reality they’re just lying dormant until they finally betray you at a more opportune time. It’s ridiculously easy to get them wrong, and they don’t lend themselves to bug-spotting at one glance. You probably should not rely on regular expressions to sanitize your data, unless you know exactly what your expression does. Because even if you think you know what it does, chances are high it will do weird things with weird inputs. Even the Supreme Grandmaster Regex Scholars of Perldom get this stuff wrong with a scary high probability. Personally, I’d advise you to steer clear of security-relevant regexes if the same can be done with a few simple and more maintainable lines of real code instead. Orbits. Nukes. Swords. Gunfights.

What to know:
- Regular Expressions: http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

Databases

We already talked about database security. As always, there is more. In most cases your database will be MySQL. Like PHP, MySQL has a very bad reputation – so it’s only natural that those two should pair up to be the web server standard configuration of the civilized world. MySQL will be fine for most of your standard storage and query needs. A lot of people tend to prematurely optimize their web projects and choose a NoSQL solution because they believe it’s going to be faster, or even simply because “the cool kids are doing it”. Needless to say, this is not actually warranted in most cases. PHP allows you to go with the database you like the most – if that’s MySQL, you’ll save a lot of configuration trouble. If it’s not MySQL, that’s fine, too. But chances are very good you don’t need a special DB solution for your web project so unless you’re experimenting you’ll probably get the most mileage out of something you already know very well.

What to avoid:
Avoid making a database connection when your request doesn’t need it. And when your request does need it, you should prefer persistent database connections over ad-hoc ones (e.g., use mysql_pconnect()) for the simple reason that it potentially shaves some execution time off the connection part. Avoid executing many queries in favor of consolidating them into just a few. Every time you fire off a query to the DB server, your program has to wait for data to come back. It’s also a good idea to avoid hugely complex queries, especially if you need to plan for scalability. SQL gives you a lot of rope to potentially hang yourself, keep performance in mind and measure it!

A word about legacy libraries
You may be wondering why I’m referencing an obsolete MySQL API above. Even though the very first example is all about DB abstraction libraries, people still bring it up. To make it clear: I’m not condoning its usage, but chances are you’ll come across similar problems at some point, for example if you’re debugging legacy code or if you work with other libraries where proper escaping becomes an issue. It’s worth to at least be aware of unsanitized input at all times. I’m not advocating the use of failure-prone, low-level, obsolete libraries – I’m trying to talk about being more conscious of dirty data. Again: use PDO, or whatever suits you, avoid naked mysql_ calls.

What to know:
- MySQL, obviously: http://dev.mysql.com/doc/refman/5.0/en/tutorial.html
- Indexes are essential

File Uploads

File uploading is the act of accepting a file from the browser into your web application. Like all user input, you must be prepared for the worst. People will try to crash your uploading code or they will try to upload executable files onto your server. When accepting an uploaded file, it’s important to check its content first before storing it on the server. For example, if an attacker manages to upload a PHP code file into one of your directories, it’s game over. That’s what happened to the Trojans. One day they allowed a highly executable horse to upload into their root directory. It was not a good day.

To avoid this, you have to make sure that, say, an image file uploaded to your app is actually an image file. The $_FILES variable contains information about a file’s MIME type. Sadly, you have to disregard this information, because it’s been supplied by the user’s browser and is thus utterly evil. Instead, you have to get the actual MIME type of the file directly. In the good old times, you could use mime_content_type() however this is now a deprecated function. Rescue comes from an unexpected place: the GD library has a function that, among other things, returns the actual MIME types of image files: getimagesize() Use this to check what your uploaded file actually contains and simply reject everything that does not correspond to one of the MIME types explicitly allowed by you.

What to know:
- MIME types: http://en.wikipedia.org/wiki/MIME
- HTTP methods: http://en.wikipedia.org/wiki/HTTP#Request_methods

Extra Credit: Caching

On many servers, you will have access to a service called memcached. It’s essentially a mini server process that allows you to store and retrieve arbitrary data very fast. To save or retrieve a data package from memcache, your application needs to connect to the memcached server and give it the key of the object you’re interested in. This key can be any string. Keep in mind though, that other applications are using the same key/value storage so choose your keys in a manner that does not lead to conflicts. Remember to store user-specific information with a user-specific key.

Like with any server, connecting to memcache and retrieving a cached object costs time. It’s only worth it if that interval is actually shorter than the time it takes to recompute a given data structure.

In the absence of a fancy memcached service, you can simply fake it, too. For example, if your application needs to generate a report or a big chunk of HTML that seldom changes, you can store that data in the file system as well. Simply give your application a tmp/ folder and dump your cached objects there. But while memcached objects expire automatically, you’ll have to take care of the lifecycle of cache files yourself.

More Performance Tuning

Keeping your response times low inside the app is just the beginning. There is much more you can do to speed up page loading. Going into all the details here would be drastically out of scope. There are several books dedicated to these problems. Just a few performance hints here that are not generally advertised:

Consolidating static files
PHP makes it ridiculously easy to stream arbitrary content to your users, including JavaScript and CSS files. The nice aspect of serving JavaScript and CSS from PHP is twofold: first, you have full control over the HTTP headers needed to tweak the browser’s caching behavior – this is an option you don’t have when serving static files from your webserver directly in most cases. Second, you can actually concatenate multiple JS or CSS files into one big chunk and give this chunk out in one go.

For example, say your web app needs JQuery, 4 JQuery plugins, 2 custom JS code files and 4 CSS files. A browser must make 11 requests to get those files. While the files themselves may be small, the latencies of requesting them can add up to a considerable delay. What’s worse, you’ll have at least 11 include tags inside your HTML for every request! You could now combine those files manually inside a text editor, but that makes them harder to maintain. It’s easier to just have PHP compile two large files: one JS and CSS!

Keep an eye on your output
From time to time, have a look at the HTML code your app is producing. Sometimes it’s amazing how much junk can creep in there. Have a look at every element and ask yourself if this really needs to be in there. The same goes for CSS and JavaScript files: over time they tend to accumulate dead end sections that not used anymore. The Unix command "grep" is your friend. With grep, you can search your entire app for all occurrences of a string and it can really help you find out whether a given piece of code is actually used.

You Do Have Comments

That’s it from my end. There is lots more to be said on all of these subjects, but it should be enough to point interested people in the right direction. Chances are, battle-hardened PHP veterans disagree with any or all of these suggestions. Some may even get violently ill upon reading this. Pay attention to them, they might have a point. Or not. Decide what makes sense to you.

Happy coding!

[email protected]

Things I Can’t Do Anymore On Mountain Lion

I’m pretty annoyed at OS X lately, so here are some

Things I Can’t Do Anymore On OS X Mountain Lion (That Were Possible On Lion)

Can’t use as many screens as I’d like

The black surface you see there is one of my screens. Mountain Lion is reasonably certain the display exists, it just won’t actually show anything on it anymore. It’s just black, except for the mouse pointer. That still gets displayed, but no desktop, no windows. Even more annoying, you can have windows on that screen but they won’t be shown. So I you’re wondering where that Finder window is you just opened, it’s probably under the black shroud.

Mountain Lion doesn’t remember screen arrangements

Every second reboot or so, ML forgets the placement of its displays. It will then either assume they’re all in a single line in some arbitrary order, or it will methodically switch all the screens left to the center with those right of the center. I expect this shit from Linux, but apparently only Microsoft can do multi screens correctly now.

Can’t have a quiet evening

I’ve been wondering why my Mac got so loud after upgrading to ML. iStat Pro shows why: the HD/Expansion slot fans are running high. I suspect this is due to the increased workload of the graphics cards (who aren’t doing anything dramatic besides displaying static content right now), but then again it might just be a random SMC bug. I don’t think they’ll ever fix this though, because it happens on the Mac Pro.

Can’t connect to remotely mounted DMGs

I got most of my data on an encrypted virtual disk. Prior to ML, I could connect to it from other Macs just as if it were a normal drive. If it was in /Volumes you could connect to it remotely. Now with ML you can only connect to physical remote drives. For no fucking reason at all.

Can’t start applications quickly

What you see there is the area of the screen where I’m waiting for Text Edit to load. On an 8-core Mac Pro with 16GB RAM and a fast SSD system drive. Yes, Text Edit. To be fair, non-Apple apps generally load faster, but this is a telling symptom I think. My guess: it has to do with the revolting iCloud “integration” all Apple apps have now.

If you own (reasonably recent) Apple Macs, upgrading to OS X 10.9 Mountain Lion does not feel like an optional step. If you’re like me, you’re always excited about new features and running the latest version of everything just gives you a warm fuzzy feeling all over. A heard a lot of people actually see OS X as a necessary evil of owning a Mac (which is completely false, as you can run Windows or Linux on them just fine), but to me the OS is the main reason why I’m a Mac user. But looking at Mountain Lion, it seems increasingly unlikely that I’ll remain a Mac user for the next decade or so.

Starting with Snow Leopard, OS X is apparently on a mission to disempower the user. Obvious bugs don’t get fixed, like the mysterious inability of Spotlight to find relevant files  -sometimes it can’t even find Applications that sit right there in the /Applications/ folder. The full screen feature remains stubbornly useless on systems with multiple displays. If an app didn’t come from the App Store, Mountain Lion won’t let you run it. Actually it grudgingly lets you run non-Apple apps if you turn on some obscure setting in the System Preferences app, but who except me does this?

Then there are these things I can’t do on Mountain Lion that were possible before. I’m sure other users would have other things to add to that list.

Let’s Stop the Unix Time Insanity

The good old Unix timestamp is something I’ve been using as a developer forever. It’s a really neat concept when you get down to it: a simple integer number counting up the number of seconds passed since January 1st 1970. It works brilliantly, especially if your code can operate on the assumption that this number is based on UTC.

If you’re like me, you might have naively considered the following implementation was at work behind the scenes: first we have an integer I counting the amount of seconds that went since a fixed date in a fixed, then we use some tables and rules R() to generate from format Z a human-readable date D suitable for a timezone T. So by calling R() we would get a sane representation of a date like this:

R(I, T, Z) = D

Those four final words cause all the insanity and headaches in relation to handling time on computers, including just recently the outages of some very high profile web sites. I know there is a lot of smugness going on in developer circles where people get high on posting comments such as “of course it is like this. it’s the way we’ve done it forever, it’s the only way.” and there will even be a lot of links to pretentious articles telling you 1001 things you supposedly didn’t know about time. But this isn’t about all that. Let’s put all the inevitable straw man posturing aside and look at the core problem: Unix timestamps are thoroughly and unnecessarily broken. They should be a continuum.

Everybody already assumes they are continuous, sometimes even Kernel developers do. This is a reasonable assumption. Let’s make it reality. Leap seconds and other timing stuff belongs firmly in the human-readable layer, the decision to include logic for this into the timestamp code itself was wrong. Countless developer hours have been wasted on it, for no good reason. Let’s throw all of that out.

I++;

Want me to remember your site? Fat chance.

Pieces about the decline of RSS are an old hat by now, and they are probably correct in assuming that Joe Normalsurfer doesn’t care about subscriptions, he cares about social spam news instead and gets all his info from Facebook. However, this is not how it works for me – and I suspect I’m not the only one. I use my social networks for connecting with people, as opposed to sucking down marketing or news by the megabyte.

I your site doesn’t have an RSS feed, I’ll have forgotten about it by this time tomorrow. But this is, coincidentally, how site owners seem to pick their goals: have your five minutes of badly segmented random web traffic now and then be forgotten until you make the viral news once again next year. That’s a bad path to be on. You should be forging long-lasting relationships with your users, or at least with your power user nerd audience. A 100-word blurp on TC is not as effective as you might think. Spare yourself the stress and frustration of constantly spewing out me-too press releases and “news” in the vain hope of landing a viral hit. Build a relationship instead, bit by bit.

The only way to do that is to offer an RSS feed. No feed, no relationship.

Success or Failure: Analyzing 599 Kickstarter Projects

As news of ever increasing pledge amounts of Kickstarter projects are coming in daily, I find myself wondering. It sometimes becomes difficult to separate anecdotes from data. People (including me) tell other people to “just get some crowd funding”, but how likely is it really to raise any money there? I’m worried that for average project founders the chances of getting funding are pretty slim, while the “big” well-connected guys seem to make ridiculous amounts of money (and not always for a readily-apparent reason, either).

Scraping 599 Project Summaries
The Kickstarter site itself does a good job of hiding the severe limitations it places on browsing through projects. It’s not obvious at first glance how amazingly difficult it is to simply browse a list of projects which has not been filtered and pre-selected to show only those items most likely to succeed. But it’s true. Likewise, it’s not easy to get a list of completed projects, or even some data about failed ones. A lot of projects are on that site – but how many? Difficult to answer. At every corner, the UI does its best to show us only what we are supposed to see.

To get a better feel for the success chances of projects, I decided to pull project data from the “Ending Soon” list. I chose this list because – as far as I can tell – there is little if any filtering going on, it’s the only meaningful list on that site that gives you just the projects, sorted by ending date. I got 599 of them, that seems to be the limit of the endlessly scrolling page. This is the best option I could find short of writing a spider to scrape the site.

The Data
Here is the raw data from this evening. It’s a list of projects, showing how far along they are both in time and funding. The Kickstarter site makes it difficult to know how long a project has been going on, but at least we know when its deadline is. Using an average project duration of 20 days, it’s possible to guess whether a project is going to fail or not. Of course, the closer a project comes to its deadline the more accurate that assessment becomes.

At first glance this shows (on a log10 scale) the projected pledge amounts versus project minimum funding goals of all projects that have 6 or less days to go until deadline. Everything south of the blue line is going to fail. As you can see, failed projects often miss their minimum funding threshold by about one order of magnitude; it’s uncommon to fail by only a small margin. Successful projects, here shown as above the blue line, seem to be more heterogenous but apparently often manage to get more than double their threshold funded.

Of the 599 projects sampled, 335 are projected to succeed while 264 will most likely fail. This means you can expect a 56% chance of getting your funding when applying on Kickstarter.

Let’s have a look at success chances as segmented by project magnitude (aka. the funding goal):

< $1000 < $10000 < $100000 < $1000000
Successes 63 211 57 4
Overfund. Ø 173% 160% 152% 125%
Failures 19 179 64 2
Underfund. Ø 45% 39% 41% 26%
Success Ratio 77% 54% 47% 67%

Now, while the results in the top magnitude bracket are not really statistically significant, it’s apparent that most successful projects are overfunded by at least 50%, even if they are relatively large! Failing projects, on the other hand, can expect to be underfunded by about the same percentage.

Among the failed projects, there was an average 0.42 orders of magnitude shortage between the funding goal and the actual pledge amount (standard deviation 0.29). Successful projects overshot their funding goal by an average 0.17 orders of magnitude (standard deviation 0.21).

So what does it mean?
After a lot of hand waving, you can assume you’ll have a 50% chance of getting funded on Kickstarter. While bigger projects are less likely to succeed, the drop-off in likelihood is NOT linear with project size – so it’s probably a good idea to aim high. If your project is bumbling along at 50% or less projected funding: don’t expect any last-minute miracles, because the data does not show an increase in funding activity towards the end of the deadline. It looks like in most cases, you should be able to tell after a few days whether a project is going to fail or succeed.

I believe there is more interesting data buried way deeper inside Kickstarter. For example, the question whether an “average Joe” type of person has a realistic chance of getting funded is still open. I suspect you’ll probably have an extremely slim chance if you don’t have some social media heavyweights ready to back you and gently steer the crowd towards your project. If this hypothesis is true, it would mean Kickstarter is something of a bubble movement (as in: a mass movement driven by dubious value assessments due to the influence of parties who are experiencing a conflict of interest). It would be interesting to look at Twitter data in this respect – maybe another day.

Update: Dan Misener seems to agree.