Machine Learning with MLLib and GraphX @ TACC

Zhao Zhang

  • Supervised Learning
  • Linear Regression
  • Classification
    • Logistic Regression
    • Support Vector Machine: good for non-linear classification
  • Unsupervised Learning
  • Lower dimension representation
    • Principle Component Analysis
  • Spare representation
    • K-Means
    • Gaussian Mixture Models
  • Independent representation
    • Principle Component Analysis

Cost Function

  • Regularization
  • Maximum Likelihood
  • KL divergence
  • cross-entropy

Graph Processing

  • frameworks
  • PageRank: direct graph by Google
  • Pregel
  • Giraph
  • GraphLab
  • GraphX

GraphX

GraphX abstracts a graph with an RDD of vertices and an RDD of edges

graphx from Apache Spark

  • Connectd Components: org.apache.spark.graphx.lib.connectedComponents
  • Triangle Counting: org.apache.spark.graphx.lib.triangleCount
  • Shortest Paths: org.apache.spark.graphx.lib.Shortestpaths
Published: Thu 04 May 2017. By Dongming Jin in

Comments !