Feb 202012

Someone recently pointed me to this blog, and it spurred my to go ahead and write this one, which I had been delaying writing until I had more time. For those of you interested mostly in startups/entrepreneurship, you may want to skip the first part, since it’s mostly me prancing around on my high horse.

Raising Taxes

As it stands, there’s this continuous debate over raising taxes and cutting spending. Being fairly conservative (of the Libertarian type), one would guess that I’d probably be against raising taxes. Sadly, this is not the case at all. Even though I think cutting spending is the number one thing that should be done, I think taxes should be raised. On. Everyone. Why?

Well, the reality is, cutting spending and raising revenue (including by raising taxes) will be more effective than either one alone. There shouldn’t be this stupid trade off between spending cuts and raising taxes. The trade offs should rather be on what to cut, and what to raise revenues with. In other words, when you start horse trading using spending cuts and tax breaks, you tend to end with neither (need I point to a specific example?). The more one side wants to do of one thing, the more the other side wants to do of the other, the further apart they get, and nothing gets resolved because everyone ends up being moronic.

But why raise taxes on EVERYONE? Punishment. Pure and simple. The American people got us into this mess, and so the American people need to pay the price. The debt increased because we voted idiots into office (and this is not singling anyone out, just about everyone elected for a long time have been certifiably stupid), and so now we have to pay the price. You know, cause and effect.

How much? Not much. I’d say just a percentage point or two for those who pay taxes, and perhaps reduce the refund by a similar amount for those who don’t pay taxes. Will it raise a whole lot of revenue? I have no idea, I can’t find a simple calculator that will let me figure out, but it’s mostly beside the point. Will the average person making $50k a year be hurt that much by losing less than $50 a month? No, not really. But the psychological impact is there, and hopefully they’ll realize what COULD happen if they don’t start electing people that are smarter than fifth graders.

On the other hand, Republicans are so stuck on this idea that increasing taxes on the wealthy is a horrible idea. I have yet to hear a single argument that passes the straight-face test. Less money for investment? Please. You give a person with assets of over a couple million dollars an extra million in cash, and what do you think they’ll do with it? Probably put it in the stock market. Which, contrary to was most seem to think, is generally going to be pretty worthless when it comes to having an impact on the economy. Sure, it might increase liquidity, but when you’re talking about buying stocks that are almost as liquid as cash, you’re not actually helping anything.

So how do you raise taxes and create economic growth?

In theory, it’s pretty simple. Small businesses and entrepreneurs are one of the major factors in economic growth. Not only do they spend a significant amount of money (instead of collecting it and passing on huge profits to executives), but they are also the engines of innovation. How do you use tax raises to help startups?

If you look at the blog I links to above, there are some suggestions that would cost money (and therefore need a source of revenue). But no one likes to raise taxes and claim the revenue will be used for a specific purpose…we all know that’s not what happens. So, how about targeted tax breaks? I have always been a fan of targeted tax breaks…or at least the concept of them. There are plenty of ways to screw them up, but I think if done correctly, they can have a very positive impact.

Recently, as it turns out, I have been pursuing, or at least researching, starting a business (now that I have spent three years throwing money into law school…makes total sense, right?). Naturally, at some point, I started thinking about how one might put those two together. I think my thought process was spurred by things like Y Combinator and CapitalFactory. There are many, many applicants, but only so much money and resources to go around. So how do you combine them?

Easy: Raise taxes on the rich, and give them the possibility to take advantage of offsetting tax deductions if they provide revenue for a true early stage startup, or a startup incubator/fund, like Y Combinator or CapitalFactory. In other words, if you raise taxes on everyone 1%, raise the taxes on high income earners 3%. But then allow them to deduct up to 3% of their income if they use it to support a startup. Or, some other combination that you think will work. If you don’t want to raise taxes at all (i.e., you disagree completely with the first part of this blog, which is understandable), don’t just indiscriminately cut taxes, use the prospect of lower taxes to DO SOMETHING GOOD. Hell, offer up to a 2 or 3% deduction to anyone who donates to a healthcare group that is dedicated to providing cheap healthcare to startups (a problem mentioned in the other blog). I don’t really care, but you get the point.

The reality is, higher income earners are in the best place to handle this kind of risk. Let’s imagine a person who brings in a million dollars a year. If they donate $30,000 to a startup, and then that startup goes belly up, they’re not impacted to a huge degree. But, on the other hand, if that startup creates the next new technology that increases the gas mileage of cars by 20%, they’ve contributed to a great advancement that has benefitted society (and them). And even if the company goes belly up, that’s investment has been injected into the economy to actually buy things or pay people; not thrown into the stock market where there’s going to be virtually no impact on the economy.

The other great thing about this idea? The investors are still investing, and still get the benefit of a good return should their investments pan out. For example, if that $30,000 turns out to be worth $300,000, you now get to tax $300,000 in income from a high income earner, creating more revenue, while encouraging MORE money to be spent on MORE startups. You end up encouraging the people who are more likely to be able to handle higher risks to invest less conservatively and put their money out there in an effort to generate a greater return not just for themselves, but for society. And this can be combined with other ideas, like providing tax breaks directly to startups.

It should be noted that this is similar to what exists (or did exist, or was proposed, I don’t remember), where when one sold stock in a company that qualified as a small business or the like, the income could be deducted or something like that. The main difference here is instead of encouraging the selling of a position in which you might already be invested in (or making it a better time to get OUT of an investment), you’re encouraging people to actually INVEST. Allowing a deduction on selling a stake in a company doesn’t really do that, but at the same time, that same type of program can be used to complement a program like I am advocating.

I have spent some time thinking about this, but for the most part, this is stuff outside of my expertise, so I welcome any comments on the topic!

Feb 092012

I have three or four blog posts in the works, but nothing will happen on them until the end of the month. They’re all fairly extensive, and between trying to study for the bar exam and some personal projects done, I don’t have the time. But, I’m going to go ahead and take the time to do this post, and probably one more while the information is still fresh in my mind.


As I just mentioned, I’ve been working on some personal projects. Mainly one big project with some small variations on that one. Basically, it’s just a pretty straightforward web application. Up until a couple months ago, I was using a MySQL backend. Then my buddy Travis starting blathering on about his latest infatuation with NoSQL and CouchDB/Membase. I decided to take a look, just because I had never really taken the time to learn anything about NoSQL. It seemed very useful, particularly for rapid prototyping, which is pretty close to the mode I have been in. Not having to deal with schemas and not having to write SQL queries is a godsend for that type of stuff.

I ended up comparing most of the big NoSQL DBs (like CouchDB and MongoDB), and for whatever reason I settled on CouchDB. There was probably a reason, and it would be useful for me to remember, but it was a couple of months ago so that won’t happen without reevaluating everything, and I don’t have that kind of time.

First, I paired CouchDB with PHPillow. PHPillow is, for a PHP project, one of the better designed projects I have come across. Now, that doesn’t make it great (or even good, really), but it really is pretty decent, comparatively. I quickly realized it was mostly useful for the interface with CouchDB (i.e., the querying aspect), and really didn’t work well with my program design (and probably wouldn’t work well with most properly designed applications). After reworking it back into my existing app, I finally started getting to use CouchDB. It was…glorious. Due to the nature of the project, the “schema” would (and will still) change frequently, so this sped things up greatly. Designing the views was simple and quick, mostly, and interfacing with the same. I can’t put a number on how much faster it is to develop with, but it is significant. But…

The Problems

There are two main problems I encountered that I’ll focus on:

  • Inability to “filter” or use wildcards to select results from a view
  • Pagination

But first…

A Disclaimer

Please note that, even though I’m pointing out the problems I had with CouchDB, that doesn’t mean that the problems are WITH CouchDB. It is just as likely, or perhaps MORE likely, that I didn’t do sufficient research and chose the wrong tool for the job. CouchDB is the way it is, and probably for particular reasons. In other words: CouchDB isn’t really made for the application I was using it for.

So please don’t take this as a knock on CouchDB. Even though I may use derogatory terms, it’s more due to the frustration that I caused myself, not because CouchDB sucks.


Let’s first set up an example:

Say you have a website that has products listed on it. The products have a few attributes, including a category, a brand, and a size. It’s a very, very basic feature that you might want to filter based on any combination of those attributes. Such as:

Find all products of ‘Brand C’ in size “Medium”. Or,

Find all products in ‘Category 1′, of any brand, and any size.

You can figure out the other possibilities. Well, with CouchDB, you create views into a product, which acts as an index. The important part of these views (or map) for this discussion are the keys. In this case, I might create a view that has an array for a key that looks like:

[product.category, product.brand, product.size]

…for each individual document in the database. So each document that gets saved would end up with an entry in that view that contains each of those elements in the key. If I want to find all products in ‘Category 1′, I can just send the server this:

['Category 1']

And it will spit back anything that matches ‘Category 1′ in the first array elements, and anything in the second and third elements. Now, what if I all products in ‘Category 1′ with ‘Brand A’? Easy:

['Category 1', 'Brand A']

That gets me what I want, in any size. So, what happens if I want to get all products in ‘Category 1′ that are size ‘Medium’? Should be something similar to:

['Category 1', *, 'Medium']

Right? No. Can’t. Be. Done. From my understanding, due to the fact this is all indexed using b-trees, it is incredibly “inefficient” to do that. Why exactly, I don’t know, since I’d have to go back and relearn b-trees and see how they’re implemented. But, needless to say, this is a huge limitation. There are ways around it, but they all really, REALLY suck. For example, you can create another view that has the size as the second element, like so:

[product.category, product.size, product.brand]

But you have to do this for almost any combination. And then you have to pick (in your code) what view you want to use based on which particular filters are being applied (which of course, could be any combination if you want your site to have that basic feature). There are other alternatives using multiple queries, the reduce function, or in-memory sorting, but none of those are acceptable for such a basic feature.

As a little more background, you can do something similar to wildcards, but the problem is, once CouchDB finds the matches in the column the “wildcard” is used, it stops on that column. In the wildcard example above, CouchDB would match all keys with ‘Category 1′ in the first column, anything in the second column, then…stop. Wouldn’t even look at the third column, so you get all values in that column.

How is this done in MongoDB?

Something like this:

db.products.find( ['category': 'Category 1','size':'Medium']);

That should get exactly what I want. Will it be as quick? Maybe not, I don’t know the differences in the underlying implementation yet. But regardless, it can actually be done, and very simply (without having to create a view/map in the first place).


In the context of the example as above, pagination is a very basic feature. You might have a thousand products, and you don’t want to display the full list at once, so you want to paginate. Very basic. Should be easy to do, right? Not in CouchDB.

The basic pattern involves using a range query (which is what the pseudo-wildcard functionality is) and limit the results to NUM_PER_PAGE + 1. So if you want twenty results per page, you get twenty-one rows from the database. You then store the last column of the first row and the last column of the last row. The query itself would be like:

startkey=['Category 1', 'Brand A', null]&endkey=['Category 1', 'Brand A', {}]&limit=21

This effectively gets all items of ‘Brand A’ that are in ‘Category 1′, of any size, and limits it to 21 results. For comparisons, null is the smallest value, and objects are the largest, so this finds any numerical values or strings (among other things that aren’t relevant). Let’s say the first and last rows are these:

['Category 1', 'Brand A', 'Medium']
['Category 1', 'Brand A', 'Small']

One of the neat features of CouchDB is that it sorts the view results based on the columns, from left to right, which explains why Medium would be the first, and Small would come after it. We display 20 rows, and just save the info from the 21st. So, again, we would store the value ‘Small’, because this is the query to get the second page:

startkey=['Category 1', 'Brand A', 'Small']&endkey=['Category 1', 'Brand A', {}]&limit=21

Notice the change? Instead of starting the search with null, we start it with the value of the 21st row, and once again, limit it to the following 21 rows. The results will start with the last row of the previous query (the one we hid):

['Category 1', 'Brand A', 'Small']
['Category 1', 'Brand A', 'XXXLarge']

…and ends 21 rows forward. And this goes on until we get to the end. Not a huge deal, right? Well, we have to keep track of all these values, because to go to previous pages, we have to “undo” these queries. So this entire time, we have been building a list, which is now:

'Medium', 'Small', 'XXXLarge'

You can think of it as a stack. Going to previous pages requires us to pop items off the top of the stack. Oh, and one other important thing: You can’t figure out the number of pages by doing this. Okay, in reality it’s not THAT important, but it’s useful and a nice feature to have.

Now, there IS a “skip” option, which would allow you to do something similar to a normal SQL limit query. But according to the CouchDB folks, this is really, really slow, and should not be done. So, there you have it.

How is this done in MongoDB?

A lot like you would do it in SQL, actually. In SQL, you just throw a limit clause at the end of a query, which would look something like:

SELECT * FROM products LIMIT 40,20

This would get 20 rows, starting from the 40th. Simple. MongoDB is just as easy…but it requires a small amount of background.

When you use MongoDB, you make a call to find(), which returns a cursor. This may be a familiar concept to those who have used SQL before. Essentially, it points to a spot in the list of results, and allows you to traverse over them (and sometimes back). With MongoDB, the query doesn’t actually get run until you try to access the results. So, no query is run with the following code:

db.products.find( ['category': 'Category 1','size':'Medium']);

At least not until you actually try to access it. That means you can do this:

$cursor = db.products.find( ['category': 'Category 1','size':'Medium']);
$cursor->skip( 40 );
$cursor->limit( 20 );

That’s it. Basically an object oriented method of what one would do with SQL. Now all you have to remember is what page you’re on, just like most people would be familiar with. And with the PHP driver/interface, the calls to skip() and limit() return the actual cursor itself, so you can chain them:

$cursor = db.products.find( ['category': 'Category 1','size':'Medium'])->skip( 40 )->limit( 20 );

Done. Again, might it suffer from the same problems that cause the CouchDB implementation to be inefficient? Possibly, but if so, they sure as hell don’t mention it on any of the beginner documents. It may be there and I just haven’t found it though.


A few other odds and ends.

As I mentioned above, there is a PHP driver for MongoDB. It is really, really clean and well put together. Seems to be at least PDO quality. Because of this integration, it is even easier to get working. In fact, I had the core CouchDB stuff replaced in a couple of hours, and only expect it to take another couple of hours to completely replace all CouchDB references in my application. MongoDB supports ad hoc queries, as well as indexes and maps, like CouchDB, which will further speed up development.

One thing that really annoyed me about CouchDB is that it is a straight document store. You can’t group related documents together. you can accomplish the same thing with views, but it seemed a bit…dirty. With MongoDB, you have “collections”, which are basically the same thing as SQL tables. There are a bunch of other features that I haven’t gotten into yet that will make things much more enjoyable than CouchDB.


CouchDB probably has its place. Rather, I KNOW it has its place. But for most web applications, MongoDB seems to be the better choice. Even if CouchDB is faster, it won’t matter for most web applications. Even if MongoDB is a little bit slower than a SQL solution, it still cuts development time by A LOT. At least for prototyping, it would be well worth it, even if you have to eventually switch back to SQL. (I say that now without having actually done it, so take that with a grain of salt, but that’s my intuition.)

I don’t know why CouchDB doesn’t support many of the same features; it could be because they ARE more inefficient, and CouchDB is all about performance…or it could simply be that they don’t have time or the man-power to add them. I don’t know, so I can’t really fault CouchDB. I’m sure for an application that requires the best performance possible, CouchDB is perfect. But for a generic web application (99.9% of the applications), MongoDB seems to be a better choice. There are obviously more evaluation that needs to go into the decision for a live, customer-facing application, particularly involving things such as replication, so you shouldn’t rely solely on this review to pick one or the other. I certainly have not ruled out going back to SQL once the app is developed, but I will evaluate that when it is necessary. For my current status, it looks like MongoDB is going to be much better.

I will write another blog in the next couple of days that gives some details on how my application is set up, and what work was involved in switching over to MongoDB.

I’d also like to provide a lot of thanks to Jamie Talbot for this blog, which finally made me realize that I couldn’t do what I wanted. Nothing else I came across was very helpful. But no thanks to him for getting me digging through the actual Erlang to try to figure out why it wouldn’t work and/or how I could make it work (to no avail).