Pages

What Programming Language Should I Use to Build a Startup?

Often entrepreneurs ask me 'What technology should I build my startup on?' There is no right or wrong answer to this question. It's a decision every company makes for itself, depending on what it's trying to build and the skills of its cofounders. Nonetheless, there are a few rules that one should adhere to. We discuss them in this blog post.

Incident Response Policy

What happens in your company when a production incident occurs? Usually in a typical startup, you will see engineers running around frantically trying to resolve the problem. However, as soon as the incident is resolved, they forget about it and go back to their usual business. A good incident response policy can help bring order into chaos. We provide a sample template in this blog post.

Why Software Deadlines Never Make Sense

We discuss why software deadlines usually don't make sense.

Analyzing Front-End Performance With Just a Browser

We discuss a number of freely available online tools which can be used to analyze bottlenecks in your website.

Why Smaller Businesses Can't Ignore Security and How They Can Achieve It On a Budget

In this article, we show that security is both important and achievable for smaller companies without breaking a bank.

Wednesday, November 28, 2012

CloudBeat 2012 -- The S Factor: Optimizing a Cloud-Based Platform for Scalability and Security

Cinchcast is a cloud-based, enterprise solution for webcasts and conference calls of any size. On a monthly basis, Cinchcast powers 15 million audio streams and attracts over 36 million unique visitors. In this talk, we’ll discuss how Cinchcast development and production environments operate and the role of New Relic in scaling Cinchcast platform to meet event demands. Dr. Yampolskiy will explain how Cinchcast maintains agile release cycles, while monitoring for performance and security issues. He will give some concrete examples of how a drastic drop in page views was discovered through a monitoring tool, or how his team thwarted a DDOS attack through cloud provisioning. Speaker Dr. Aleksandr Yampolskiy, CTO, Cinchcast Moderator Vanessa Alvarez, Director of Product Marketing, Gridstore

Monday, November 26, 2012

Good Questions to Ask During Technical Architecture Reviews


Here is a list of good technical questions to ask during technical architecture reviews.
If a presenter doesn't know answers to them, then the product is probably not ready to be built:

  1. Can you draw a systems diagram for me?
  2. How will this work on 4 or more boxes? How will you load balance requests between them?
  3. What's the average latency for a request? What can you cache? (Again, if a person didn't think through this, then the systems isn't ready).
  4. How will you test this?
  5. What can fail? How can we build a system so that it degrades gracefully when failures happen?
  6. What are the security risks?

Sunday, November 25, 2012

Cinchcast Architecture - Producing 1,500 Hours Of Audio Every Day

(This article originally appeared on High Scalability website a few months back: http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html)


Cinchcast provides solutions that allow companies to create, share, measure and monetize audio content to reach and engage the people that are most important to their business.  Our technology integrates conference bridge with live audio streaming to simplify online events and enhance participant engagement. The Cinchcast technology is also used to power Blogtalkradio, the world’s largest audio social network. Today our platform produces and distributes over 1,500 hours of original content every day.   In this article, we describe the engineering decisions we have made in order to scale our platform to support this scale of data.

Stats

  • Over 50 million page views a month
  • 50,000 hours of audio content created
  • 15,000,000 media streams       
  • 175,000,000 ad impressions
  • Peak rate of 40,000 concurrent requests  per second  
  • Many TB/day of data stored in MSSQL, Redis, and ElasticSearch clusters
  • Around a 100 hardware nodes in production.

Data Centers

  • Production website is run from the data center in Brooklyn. We like to control our own destiny instead of relegating data to the cloud. 
  • Amazon EC2 instances are used mostly for QA and Staging environments.

Hardware

  • About 50 web servers
  • 15 MS SQL database servers
  • 2 Redis NOSQL key value servers
  • 2 NodeJS  servers
  • 2 servers for elastic search cluster

Dev Tools

  • .NET 4 C# : ASP.NET and MVC3
  • Visual Studio 2010 Team Suite as an IDE
  • StyleCop, Resharper for enforcing code standards
  • Agile development methodology, with Scrum used for large features and Kanban taskboard for smaller tasks
  • Jenkins + Nunit for testing and continuous integration
  • Sauce On Demand – Selenium for automation testing

Software And Technologies Used

  • Windows Server 2008 R2 x64: Operating System
  • SQL Server 2005 running under Microsoft Windows Server 2008 Web Server
  • Equalizer load balancers: for load balancing
  • REDIS: used as the distributed caching layer and for message pub-sub queue
  • NODEJS for real-time analytics and updating studio dashboard
  • ElasticSearch : for show search
  • Sawmill + custom parser scripts: for log analysis

Monitoring

  • NewRelic for performance monitoring 
  • Chartbeat for impact of performance on KPI (conversions, page views)
  • Gomez, WhatsupGold, Nagios for various alerting
  • SQL Monitor: from Red Gate - for SQL Server monitoring

Our Approach

  • “Be brief, be bright, be gone” : Respect another person’s time. Don’t come with problems, come with solutions.
  • Don’t go chasing hot technologies of the day. Instead ‘mitigate your top problems’.   We adopt new technologies but do so, when the business case requires it. Appetite for Production outages decreases significantly when you have millions of users.
  • Achieve “essential”, then worry about “excellent”.
  • Be a “how team” instead of a “no team”.
  • Build security into the software development lifecycle.  You need to train developers on how to write secure software and make it a business priority from the start.

Architecture

  • All Javascript, CSS and images are cached at the CDN level. The DNS points to a CDN which passes requests to origin servers. We use Cotendo because it allows to make L7 routing decisions at the CDN.
  • Separate cluster of web servers is used to serve requests for regular users and requests for ad users, differentiated by a cookie.
  • We are moving towards a service-oriented architecture where key pieces of the system, such as search, authentication, caching, are RESTFUL services implemented in various languages. These services also provide a caching layer.
  • REDIS NOSQL key-value store (redis.io) is used as a cache layer before database calls.
  • Scaleout is used to maintain a session state across a garden of web servers. However, we are considering switching onto REDIS.


Lessons Learned

  • Text search in SQL server database doesn’t work well.  It was clogging up the CPU so we switched to ElasticSearch (a Lucene derivative).
  • The built-in session module by Microsoft is prone to deadlocks, so we ended up replacing it with AngiesList session module, storing data to REDIS.
  • Logging is key to detecting problems.
  • Reinventing the wheel can be a good thing. For example, initially we used a vendor product for bundling JS/CSS together which started causing performance issues. We then rewrote bundling ourselves, and significantly improved performance of our site.
  • Not all data is relational, so database isn’t always a good medium. A good analogy is “Imagine you have water flowing down the pipe. The pipe is wide at the top but gets narrow towards the bottom.”  The top is the web servers (there are many of them), the bottom is the databases (there are few and they get clogged up).
  • Not using metrics in your development process is like trying to land a plane in a storm with your altimeter not working. Throughout your development process, compute metrics such as site throughput, time to fix Blocker/Critical bugs, code coverage and use them to gauge your performance.


The S Factor: Optimizing a Cloud-Based Platform for Scalability and Security

This Tuesday, I am flying out to CloudBeat 2012 conference (http://venturebeat.com/events/cloudbeat2012/agenda/) to talk about how we scaled Cinchcast audio technology to handle millions of visitors.

The abstract is below. If you are in the San Fran, do stop by.  Thanks to Vanessa Alvarez from Forrester for moderating:

Cinchcast is a cloud-based, enterprise solution for webcasts and conference calls of any size. On a monthly basis, Cinchcast powers 15 million audio streams and attracts over 36 million unique visitors. In this talk, we’ll discuss how Cinchcast development and production environments operate and the role of New Relic in scaling Cinchcast platform to meet event demands. Dr. Yampolskiy will explain how Cinchcast maintains agile release cycles, while monitoring for performance and security issues. He will give some concrete examples of how a drastic drop in page views was discovered through a monitoring tool, or how his team thwarted a DDOS attack through cloud provisioning.

Saturday, November 10, 2012

Startup Exits : a primer

A killer deck by Mark Suster. Thanks to Jatin Shah for pointing it out (cross-posted from http://jatinshah.tumblr.com/post/35405359389/startup-exits-by-mark-suster)


 

Friday, November 9, 2012

Angular

At Cinchcast, we've been investigating the use of Angular for structuring our HTML and Javascript code. I've got to admit it it's a very clean framework, and the data binding feature is amazing. You no longer have to write jquery functions and callbacks to update the page. Angular does it all for you: By the way, we are hiring great engineers in New York area. If you know .NET and want to work with NodeJS, Redis, MongoDB, Angular, and a slew of other technologies, drop us a note at jobs@cinchcast.com

Incident Response Policy


What happens in your company when a production incident occurs?
Usually in a typical startup, you will see engineers running around frantically trying to resolve the problem. However, as soon as the incident is resolved, they forget about it and go back to their usual business.


A good incident response policy can help bring order into chaos. There are a few best-practices that one should keep in mind when production outages occur:

- Having a procedure in place helps reduce the panic. Security incidents should be treated differently than production outages.
- In the report, explain a response timeline and how the problem was discovered.
- An incident report should be written the same day as an incident occurred. Otherwise, you risk forgetting what happened.
- It should have concrete follow-up actions, tracked as JIRA tickets. If you don't do this, then engineers will not follow up.
- Put up incident reports in a public location and compute metrics Are incidents happening less frequently this month than the previous? Is there any correlation between incidents? Are follow-up actions being addressed?

Attached is a sample incident response template that I've used.


Incident Analysis Report



Time of Incident:
5:22AM
Time of Recovery:
5:50AM 3/15/12
Date Issue first identified
3/15/12
Discovered by:
Alex
Incident Report Prepared By:
Alex
Date:
3/15/12






I.           Description of Incident:
II.         AWS Statement
2:40 AM PDT We are investigating connectivity issues for EC2 in the US-EAST-1 region.
3:03 AM PDT Between 2:22 AM and 2:43 AM PDT internet connectivity was impaired in the US-EAST-1 region. Full connectivity has been restored. The service is operating normally.


III.       Business Impact: Frustrated customers because the website ACME was unaccessible.

IV.       Security Impact:
There are no known issues related to this subject           

V.         Technical Impact:
There weren’t enough servers to handle the load for new customers.

VI.       Event Timeline: 


5:30AM
All Amazon hosts were inaccessible
5:50PM
Service was restored 











VII.     Lessons Learned:
-          We need to know the business impact for each server on Amazon and put DR polices and procedures in place for outages.  We could also leverage the California EC2 Cloud to potentially help outages in just Virginia.



VIII.   Action Items:


1.       Called EC2 and they are going to alert us of what they find out about the issue (INFRA-123)
2.       Identify what we can  and can’t do if EC2 goes down   (INFRA-345)




Thursday, November 8, 2012

Icecast security


[1]


Icecast is a server program used to stream in MP3 or Ogg Vorbis formats, which is very popular in Internet radio community. Many CDNs including Limelight use it to stream live MP3 streams. I've been browsing the web for typical vulnerabilities afflicting Icecast.  It looks like the trend is positive.  According to CVEdetails [2] the last vulnerability in the database dates 2007 and the trend has been declining :


Vulnerabilities By Year
5
2
3
2
1
  2001 5
 2002 2
 2004 3
 2005 2
 2007 1
Vulnerabilities By Type
5
7
7
2
1
1
  Denial of Service 5
 Execute Code 7
 Overflow 7
 Directory Traversal 2
 XSS 1
 Bypass Something 1













References
[1] Illustration from http://livestream123.com/wp-content/uploads/icecast.jpg
[2] http://www.cvedetails.com/vendor/693/Icecast.html

Cinchcast Connect product

I am very excited that our Tech team has released Cinchcast Connect product. A common problem in large conference calls with hundreds or thousands of participants is that you do not know who is on the line. In Cinchcast Connect, we implemented a universal PIN :
"Registered participants receive a unique PIN code to access the audio conferencing portion of corporate events hosted on the Cinchcast platform. Event participants do not have to wait on hold to be screened by operators prior to entering events. In addition, for users who may attend multiple corporate events (Employee Town Halls, Team Meetings, Earnings /Analyst Calls), once an individual has registered on the Cinchcast platform, their unique PIN code will always be the same."  [1]

Now you no longer have to guess who is on the call because names of attendees are displayed in our studio.  You will see in real-time the number of listeners on the web and callers on the phone.

Our player is HTML5 compliant and requires no browser plugins, works over regular HTTP port 80 so you don't need to poke holes in a firewall, and requires minimal bandwidth requirement (15x-20x less than a video stream). So it turned out to be a great product:


If you are interested to try it out, please drop us a line: http://cinchcast.com/contact/



References
[1]  http://cinchcast.com/news/cinchcast-launches-universal-pin-code-access-for-enterprise-conference-calls/




We'
Read m

Referore here: http://www.sacbee.com/2012/10/29/4945817/cinchcast-launches-universal-pin.html#storylink=cpy

Branching Strategy

At Cinchcast Tech, we've been spending a lot of time discussing a proper branching strategy for our codebase.

There exist dev, qa, and staging branches.  All development starts locally and then gets merged into the dev branch. After testing, QA team can merge it into qa branch. Finally, when the code is ready to be released it gets merged into the staging branch:



When we work on new releases, we follow one of two approaches:
1. Release branches. A separate branch is created for each release.
For example, FOO_3_1_2 branch would be created for all work done on release 3.1.2 of the FOO project.

2. Feature branches. A separate branch is created for each large component. Typically these components require isolated testing, and are merged into the main branch only at the end. The naming convention is AY_MODULE where AY is initials of a developer and MODULE is the name of the component.

All new branches are created off a staging branch, which should mimic the code that's running in production.
Any urgent hotfixes are typically made directly on a staging branch, and then backported into other branches.

Any load testing or security analysis is typically done during QA stage when the code has been merged into qa branch. We have a variety of scanners running 24x7 against our qa and production environments, such as Mcafee Secure scanning for dynamic security vulnerabilities and NewRelic continuously checking the performance of the application. If any issues are found, then the code is rolled back and cannot go into Production.

Note: We are always looking to hire great software engineers. So if you are one, and are looking for an exciting environment to work at, email us at jobs@cinchcast.com



Wednesday, November 7, 2012

A nice diagram of OpenRTB ecosystem

(from http://www.iab.net/media/file/OpenRTB_API_Specification_Version2.0_FINAL.PDF)

Interesting Stats About My Gmail

GMailMeter (http://www.gmailmeter.com/) is a clever tool, which analyzes your Gmail mailbox for detailed statistics on how you use your email. Hourly and weekly volume, number of words per email, time to respond are all interesting statistics that it measures on a month-to-month basis.

I tried it out and within 30 minutes learned that :

- most of my emails have between 1-100 words (i do like to cut right to the point)
- i get a lot of emails (already knew that)
- i respond to 15% of my emails in under 5 minutes (now that's scary)
- and only 59% of emails are addressed directly to me
- number of emails i send spikes up after 6pm (logical with two little kids in the house)

Overall, GMailMeter seemed like a very useful tool and I recommend everyone else to try it.
Now I just need to figure out what to do with this statistics.

In the past month:


1927 conversations

660 were important
47 have been starred
I have started 20.29% of them
and have replied to 6.12% of the others

2487 emails received

received from 580 people
59.07% were sent directly to me

739 emails sent

to 138 people











Saturday, November 3, 2012

What Technology Stack Should My Startup Use?

Often entrepreneurs ask me 'What technology should I build my startup on?'
There is no right or wrong answer to this question.  It's a decision every company makes for itself, depending on what it's trying to build and the skills of its cofounders.  Nonetheless, there are a few rules that I try to adhere to:

1. Your technology choice doesn't matter much.
For early stage startups, the main goal should be to get their application up and running as soon as possible. Then, they will be able to get customers, funding and hire great engineers. Most languages are similar to one another, and even if you discover that a particular technology choice was a wrong one, you can fix it up in the future. For example, Facebook was written in PHP, then they ran into scalability issues, rewrote parts of the application as services communicating over Thrift messaging protocol, and fixed them.

2. 'Don't chase hot technologies of the day'.
You should use technology that's right for the job, and not just because it's trendy. Often, engineers like to choose a technology just because it's trendy. Guess what - technology trends just like fashion trends come and go.  For example, NodeJS is very hip right now, but it uses a single thread for computations which makes it not a top choice for CPU intensive computations.

3. Ask around.
Ask around other people about what they are doing and why.   Did they use MongoDB for storing analytics information or stored in a database? Are they using SOLR or ElasticSearch for real-time search. Experience helps and many technologists will be happy to lend free advice.

4. Don't use esoteric technologies where little open-source innovation is happening.
By using esoteric technologies, you will have a harder time to recruit engineers. Technologies where lots of open-source innovation is happening (Ruby on rails, .net, java, etc.) are always a good mainstream choice.   On the other hand, Pascal, maybe not so much.

Friday, November 2, 2012

Donate to Sandy


Everyone, please go ahead and donate to support the recovery efforts from Hurricane Sandy.
GENERAL
For local Red Cross chapters:
New York
New Jersey
Connecticut
For more from the Salvation Army or to donate, visit  https://donate.salvationarmyusa.org/disaster
For local Salvation Army chapters:
NEW YORK CITY

Thursday, November 1, 2012

Jiro Dreams of Sushi - Quest for Perfection

I just saw on NetFlix "Jiro Dreams of Sushi"


It's a touching documentary about an 85 year old sushi chef
Jiro Ono, and his quest for a perfect sushi. His hole-in-a-wall restaurant possesses the coveted 3-star Michelin rating because of his attention for detail, love for his work, and constant strive for perfection.


A famous food critic Yamamoto says in the movie about what it takes to make a great chef.
I believe that the same qualities apply to being a great Computer Scientist or an Entrepreneur:
A great chef generally has the following five attributes.
First, they take their work very seriously and consistently strive to perform at the highest level.
Second, they aspire to continually improve their skills. To be better today than yesterday. To be better tomorrow than today.
Third, cleanliness. If the restaurant doesn’t feel clean, the food isn’t going to taste good.
The fourth attribute is impatience. They are not prone to collaboration. They’re stubborn and insist on having things their own way.
What ties these attributes together is passion. That’s what makes a great chef.