It is difficult for traditional speech systems, including those commercially available today, to take full advantage of big datasets. For example, traditional speech systems require phonemic transcriptions in order to train their acoustic models. The process of creating a training set based on phonemic representation is complicated, since humans are not good at manually labeling phonemes. This is usually done with a multi-stage and possibly error-prone bootstrapping process. In contrast, our approach requires only text transcriptions of the utterances, which are easy for people to create. Additionally, a traditional model that outputs phoneme probabilities for use by a language model has imposed an intermediate representation: the phonemes themselves. Deep Learning has been so successful in many fields precisely by avoiding the imposition of such intermediate representations, instead learning the best intermediate representations for the job at hand, given the training data. Because our models are trained to produce character probabilities, we don’t need an explicit intermediate representation, and our networks can do a better job at learning from the data. We built Deep Speech because we saw the opportunity to re-conceive speech recognition in light of the new capabilities afforded by Deep Learning, to take advantage of even larger datasets to solve even harder problems.
The SF Big Analytics meetups focus on all aspects of the big data analytics. From data ETL, feature generation, machine learning theory, algorithm and implementation to technologies and infrastructures associated with big data analytics: data processing ( ex. hadoop map reduce, spark, hive, spark SQL, pig …), data storage, big data visualization, devOp ( docker with hadoop … ) etc.