Uncategorized

Ultimate guide for scraping JavaScript rendered web pages

Yasoob:

I really wanted to write a guide for this myself but didn’t get the time. Here Naren arya wrote a great post and I think that you should definitely give it a look.

Originally posted on impythonist:

We all scraped web pages.HTML content returned as response has our data and we scrape it for fetching certain results.If web page has JavaScript implementation, original data is obtained after rendering process. When we use normal requests package in that situation then responses those are returned  contains no data in them.Browsers know how to render and display the final result,but how a program can know?. So I came with a power pack solution to scrape any JavaScript rendered website very easily.

Many of us use below libraries to perform scraping.

1)Lxml

2)BeautifulSoup

I don’t mention scrapy or dragline frameworks here since underlying basic scraper is lxml .My favorite one is lxml.why? ,It has the element traversal methods rather than relying on regular expressions methodology like BeautifulSoup.Here I am going to take a very interesting example.I am so amazed after finding that ,my article is appeared in recent PyCoders weekly issue…

View original 681 more words

Standard
Uncategorized

Writing C in Cython

Originally posted on Computational Linguistics:

For the last two years, I’ve done almost all of my work in Cython. And I don’t mean, I write Python, and then “Cythonize” it, with various type-declarations etc. I just, write Cython. I use “raw” C structs and arrays, and occasionally C++ vectors, with a thin wrapper around malloc/free that I wrote myself. The code is almost always exactly as fast as C/C++, because it really is just C/C++ with some syntactic sugar — but with Python “right there”, should I need/want it.

This is basically the inverse of the old promise that languages like Python came with: that you would write your whole application in Python, optimise the “hot spots” with C, and voila! C speed, Python convenience, and money in the bank.

This was always much nicer in theory than practice. In practice, your data structures have a huge influence on both the efficiency of your…

View original 278 more words

Standard
Uncategorized

HTTP streaming of command output in Python Flask

Originally posted on Musing Mortoray:

I needed an endpoint that streamed the output of an external program to the remote client. In this article I describe how I did it and discuss a few issues I encountered. Note that if you just want to stream events back to the browser, I’ll also cover that. An external command is just what I needed, and is the more difficult case.

A simple stream

The program below is a simple Flask server. To run it you need to pip install flask shelljob. Save it to a file server.py and then run python server.py.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
importflaskfromshelljobimportprocapp=flask.Flask(__name__)@app.route('/stream')defstream():g=proc.Group()p=g.run(

View original 1,179 more words

Standard
Uncategorized

How to become a programmer, or the art of Googling well

Yasoob:

Not particularly related to Python but still a good read for every programmer :)

Originally posted on okepi:

*Note: Please read all italicized technical words as if they were in a foreign language.

The fall semester of my senior year, I was having some serious self-confidence issues. I had slowly come to realize that I did not, in fact, want to become a researcher. Statistics pained me, and the seemingly endless and fruitless nature of research bored me. I was someone who was driven by results – tangible products with deadlines that, upon completion, had a binary state: success, or failure. Going into my senior year, this revelation was followed by another. All of my skills thus far had been cultivated for research. If I wasn’t going into research, I had… nothing.

At a liberal arts college, being a computer science major does not mean you are a “hacker”. It can mean something as simple as, you were shopping around different departments, saw a command line for the…

View original 1,398 more words

Standard